You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Frances Perry <fj...@google.com.INVALID> on 2016/02/09 16:46:02 UTC

status update

Hi Beamers!

Here’s the Apache Beam: Technical Vision
<https://docs.google.com/document/d/1UyAeugHxZmVlQ5cEWo_eOPgXNQA1oD-rGooWOSwAqh8/edit?pref=2&pli=1#heading=h.e5s64nliyukh>
document I shared last week with a number of you. (Now we have a dev@ list
to share it more widely -- yay!)

I just wanted to give you a little visibility into some of the work we’ve
been doing within Google over the last week:

* Refactoring the DataflowJavaSDK: We’re hard at work separating out the
user-facing portions of the DataflowJavaSDK from the Google-specific worker
harness. This will ensure that all runners (Cloud Dataflow, Spark, Flink)
are on equal footing with clear APIs to implement. Due to the complications
that come with doing that while supporting our current users, we won’t be
able to push those changes to GitHub for a couple of weeks or so.

* Repository structure: As we get ready to start moving different chunks of
code into the new repo, we need to figure out the right way to structure
it. Here’s a proposal
<https://docs.google.com/document/d/1mTeZED33Famq25XedbKeDlGIJRvtzCXjSfwH9NKQYUE/edit?usp=sharing>
-- please provide feedback!

* Issue tracking: Thanks to JB for getting the Beam JIRA
<https://issues.apache.org/jira/browse/BEAM/> set up. We were thinking that
it makes sense to put in components that match the repository structure
(see above). And then we’ll go ahead and start transitioning our internal
Google bug tracking into JIRA.

Frances

Re: status update

Posted by Frances Perry <fj...@google.com.INVALID>.
Ah, I bet I should have included the links directly ;-)

Apache Beam: Technical Vision
https://docs.google.com/document/d/1UyAeugHxZmVlQ5cEWo_eOPgXNQA1oD-rGooWOSwAqh8/

Apache Beam: Repository Structure
https://docs.google.com/document/d/1mTeZED33Famq25XedbKeDlGIJRvtzCXjSfwH9NKQYUE/

On Tue, Feb 9, 2016 at 7:46 AM, Frances Perry <fj...@google.com> wrote:

> Hi Beamers!
>
> Here’s the Apache Beam: Technical Vision
> <https://docs.google.com/document/d/1UyAeugHxZmVlQ5cEWo_eOPgXNQA1oD-rGooWOSwAqh8/edit?pref=2&pli=1#heading=h.e5s64nliyukh>
> document I shared last week with a number of you. (Now we have a dev@
> list to share it more widely -- yay!)
>
> I just wanted to give you a little visibility into some of the work we’ve
> been doing within Google over the last week:
>
> * Refactoring the DataflowJavaSDK: We’re hard at work separating out the
> user-facing portions of the DataflowJavaSDK from the Google-specific worker
> harness. This will ensure that all runners (Cloud Dataflow, Spark, Flink)
> are on equal footing with clear APIs to implement. Due to the complications
> that come with doing that while supporting our current users, we won’t be
> able to push those changes to GitHub for a couple of weeks or so.
>
> * Repository structure: As we get ready to start moving different chunks
> of code into the new repo, we need to figure out the right way to structure
> it. Here’s a proposal
> <https://docs.google.com/document/d/1mTeZED33Famq25XedbKeDlGIJRvtzCXjSfwH9NKQYUE/edit?usp=sharing>
> -- please provide feedback!
>
> * Issue tracking: Thanks to JB for getting the Beam JIRA
> <https://issues.apache.org/jira/browse/BEAM/> set up. We were thinking
> that it makes sense to put in components that match the repository
> structure (see above). And then we’ll go ahead and start transitioning our
> internal Google bug tracking into JIRA.
>
> Frances
>

Re: status update

Posted by Davor Bonaci <da...@google.com.INVALID>.
Just to close the loop -- I have updated the document based on the latest
round of feedback.

Now that the initial code has been dropped and we have a few pull requests,
we'll start merging those according to the plan.

On Wed, Feb 24, 2016 at 5:39 AM, Sandeep Deshmukh <sa...@datatorrent.com>
wrote:

> Thanks Devor for detailed explanation of the thought process. That
> certainly helps.
>
> I do favor feature branches as they are must of major work. The proposed
> development/feature branch for the time being is perfectly fine. Just a
> caution that keeping the master updated could be taxing. We can certainly
> revisit this later once the project is stable in Incubation.
>
> re: timing of major releases
> Beam being new and rapidly evolving, makes sense to favor growth right now.
>
> Regards
> Sandeep
>
>
> On Fri, Feb 19, 2016 at 1:42 AM, Davor Bonaci <da...@google.com.invalid>
> wrote:
>
> > Thanks Sandeep for your insightful comments! I'll add a little more
> details
> > to follow up on Frances' email to clarify what we were proposing.
> >
> > We were indeed proposing the concept of the development / feature
> branches
> > (for the time being). I think the goal of having the master branch in a
> > "working state" is a clear, non-contentions goal that equally benefits
> all
> > users and all developers. "Working state" doesn't necessarily mean
> > production-ready, but means that "most" things work out of the box. Not
> > having this goal would be hugely taxing for everyone.
> >
> > Now, this goal can be achieved either via:
> >
> >    - direct commits to master with a proper presubmit coverage,
> >    - direct commits to a separate branch(es) with regular integration to
> >    the master, or
> >    - a combination of these approaches on a per-pull request basis.
> >
> > I think everyone universally prefers the first choice. Google committers
> > certainly do.
> >
> > However, we start this project from a point of zero presubmit coverage,
> and
> > expectation of a ton of refactoring and breaking changes. We felt letting
> > all these changes into master right away without proper coverage would
> make
> > everyone's life a little sad. Here's a very simple example -- the vision
> we
> > proposed involves completely changing the runner-facing APIs. Every
> commit
> > of this work will break parts of our project -- such works needs to be
> > separated, otherwise *everyone* gets blocked.
> >
> > [ Additionally, Beam is a somewhat unique project -- it makes sense with
> > services, such as Dataflow Service, Flink or Spark, that are maintained
> as
> > separate projects -- so, we need a decent cross-project integration
> > coverage. ]
> >
> > With this in mind, our proposal is keep the development / feature
> branches
> > *for the time being* -- until we replicate a portion of the presubmit
> > coverage we have internally at Google. This is the "default" place for
> > executing against the vision, cleaning runner API, etc. As Frances said,
> > there are classes of commits, such as a new IO connector or a new
> > domain-specific PTransform, where we can simplify this and go directly to
> > master. In any case, master should never be behind the development branch
> > by more than a week or so.
> >
> > --
> >
> > re: timing of major releases
> > This is often a contentions topic. The other side of the argument is
> > two-fold:
> >
> >    - Releasing rarely creates a bigger cliff. Hence, it becomes a higher
> >    burden and less likely to happen.
> >    - Releasing rarely optimizes experience for existing users, not new
> >    users. If we are growing quickly, that may not be the best metric to
> >    optimize for.
> >
> > In reality, we are likely to release more often as we mature. Later on,
> we
> > are likely to release less often.
> >
> > I think decisions in this space should be delayed until the time we need
> to
> > make a choice and consider the payload we want to release. That said,
> > having some timeline in mind is often beneficial to make right design
> > tradeoffs.
> >
> > --
> >
> > re: @evolving idea
> > We are already doing this and planning to continue -- this works really
> > well. See @Experimental here
> > <
> >
> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/a8347d186700e83edb91a3c673200abec1b1d4e7/sdk/src/main/java/com/google/cloud/dataflow/sdk/annotations/Experimental.java
> > >
> > .
> >
> > On Thu, Feb 18, 2016 at 10:17 AM, Robert Bradshaw <
> > robertwb@google.com.invalid> wrote:
> >
> > > Yes, (b) would be a so-called feature branch. (The proposal here is to
> > > discard the idea of having a separate long-lived "develop" branch.)
> > >
> > > On Thu, Feb 18, 2016 at 10:08 AM, Frances Perry <fjp@google.com.invalid
> >
> > > wrote:
> > >
> > > > Our developers are going to be a varied group -- so "main
> development"
> > > will
> > > > look quite different to different developers. In particular, look at:
> > > > (a) a developer writing a java sdk extension for a new IO connector
> > > > (b) a developer changing the beam model
> > > >
> > > > I think it's fine for work like (a) to occur on master, but I think
> > > things
> > > > like (b) should  happen on a development branch so that we can keep
> the
> > > > master branch in a working state. There are going to be a number of
> > > large,
> > > > backwards incompatible, churn-y changes to Runner APIs in the near
> > > future.
> > > > I'd like us to be able to do those in a way that doesn't affect folks
> > who
> > > > are attempting more surface level contributions.
> > > >
> > > > Frances
> > > >
> > > > On Thu, Feb 18, 2016 at 8:07 AM, Robert Bradshaw <
> > > > robertwb@google.com.invalid> wrote:
> > > >
> > > > > +1 to using master for main development (and most non-ASF projects
> > use
> > > > > master like this too). Not having master (the default when one
> > clones,
> > > > > etc.) be at HEAD is often surprising. Tags are easy enough to use
> > when
> > > > one
> > > > > wants a stable version.
> > > > >
> > > > > - Robert
> > > > >
> > > > >
> > > > > On Wed, Feb 17, 2016 at 11:38 PM, Jean-Baptiste Onofré <
> > > jb@nanthrax.net>
> > > > > wrote:
> > > > >
> > > > > > Thanks Henry, I remember now, and Frances posted the link.
> > > > > >
> > > > > > I agree: we should use the master branch as dev branch as all
> other
> > > ASF
> > > > > > projects do.
> > > > > >
> > > > > > Regards
> > > > > > JB
> > > > > >
> > > > > >
> > > > > > On 02/18/2016 08:04 AM, Henry Saputra wrote:
> > > > > >
> > > > > >> Actually no, it is a bit different.
> > > > > >> The concept of develop branch is following the "successful git
> > > > branching
> > > > > >> model" blog post [1] that introduce using develop branch as
> active
> > > > > branch
> > > > > >> for development and use master as stable branch.
> > > > > >>
> > > > > >> I would recommend using master branch instead as default branch
> to
> > > do
> > > > > >> active development to match other ASF projects.
> > > > > >>
> > > > > >> Some projects using develop from origin company, like Twill [2],
> > had
> > > > > also
> > > > > >> moved to using master as default active branch.
> > > > > >>
> > > > > >> Just my 2 cents.
> > > > > >>
> > > > > >> Thx.
> > > > > >>
> > > > > >> Henry
> > > > > >>
> > > > > >>
> > > > > >> [1] http://nvie.com/posts/a-successful-git-branching-model/
> > > > > >> [2] http://twill.incubator.apache.org/HowToContribute.html
> > > > > >>
> > > > > >> On Wed, Feb 17, 2016 at 10:52 PM, Jean-Baptiste Onofré <
> > > > jb@nanthrax.net
> > > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >> Hi,
> > > > > >>>
> > > > > >>> Correct me if I'm wrong, but I'm assuming that develop ==
> master
> > > > (from
> > > > > a
> > > > > >>> git perspective).
> > > > > >>>
> > > > > >>> I configured Jenkins this way as it's the "regular" naming ;)
> > > > > >>>
> > > > > >>> I think Frances said "develop" from a dev perspective. All
> > projects
> > > > use
> > > > > >>> master (it's what I'm doing in Falcon, Lens, Karaf, Camel, etc,
> > > etc).
> > > > > >>>
> > > > > >>> Maybe I'm wrong ;)
> > > > > >>>
> > > > > >>> Regards
> > > > > >>> JB
> > > > > >>>
> > > > > >>>
> > > > > >>> On 02/18/2016 06:46 AM, Sandeep Deshmukh wrote:
> > > > > >>>
> > > > > >>> Hi All,
> > > > > >>>>
> > > > > >>>> I have some comments on the repository structure and most of
> > them
> > > > are
> > > > > >>>> wrt
> > > > > >>>> my experience in another Apache incubating project.
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>      1. Most active projects use *master* as default
> development
> > > > > branch
> > > > > >>>> than
> > > > > >>>>      *develop*.  For example, Flink, Spark, Storm, Samza, Pig,
> > > Hive,
> > > > > and
> > > > > >>>>      Hadoop use master branch.
> > > > > >>>>      2. Released artifacts are always hosted on downloads
> > > page.Maser
> > > > > >>>> need
> > > > > >>>> not
> > > > > >>>>      be the one with production ready state.
> > > > > >>>>      3. It is quite intuitive to use *master* otherwise new
> > > > > contributors
> > > > > >>>>      needs to go through documentation to understand process
> of
> > > each
> > > > > >>>> project.
> > > > > >>>>      4. Overall, the process becomes simple if *master* is the
> > > > default
> > > > > >>>> branch.
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> Another suggestion is related to release with major version
> > > change.
> > > > > >>>> Major
> > > > > >>>> release twice a year is a lot of burden on the end user if
> they
> > > want
> > > > > to
> > > > > >>>> upgrade to a newer version. To address this issue, newly added
> > > APIs
> > > > > can
> > > > > >>>> be
> > > > > >>>> marked as @evolving so that users are aware of possible change
> > in
> > > > the
> > > > > >>>> upcoming release but the stable one should be carefully
> changed.
> > > > > >>>>
> > > > > >>>> Regards,
> > > > > >>>> Sandeep
> > > > > >>>>
> > > > > >>>> On Sat, Feb 13, 2016 at 2:34 AM, Frances Perry
> > > > <fjp@google.com.invalid
> > > > > >
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>> Thanks for all the feedback! Please keep it coming as needed.
> > > > > >>>>
> > > > > >>>>>
> > > > > >>>>> We've gone ahead and created components matching this
> > structure:
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/BEAM/?selectedTab=com.atlassian.jira.jira-projects-plugin:components-panel
> > > > > >>>>>
> > > > > >>>>> We'll work on transition existing state from Google-internal
> > > tools
> > > > > into
> > > > > >>>>> this over the next few weeks.
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> On Fri, Feb 12, 2016 at 7:47 AM, Kenneth Knowles
> > > > > >>>>> <klk@google.com.invalid
> > > > > >>>>>
> > > > > >>>>>>
> > > > > >>>>>> wrote:
> > > > > >>>>>
> > > > > >>>>> On Thu, Feb 11, 2016 at 8:53 AM, Maximilian Michels <
> > > > mxm@apache.org>
> > > > > >>>>>
> > > > > >>>>>> wrote:
> > > > > >>>>>>
> > > > > >>>>>> As for the /develop branch, I would suggest to
> > > > > >>>>>>
> > > > > >>>>>>> make it mandatory to have it in a usable state at all
> times.
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> +1
> > > > > >>>>>>
> > > > > >>>>>> If breakage is accidentally committed (as will happen) then
> a
> > > CTR
> > > > > >>>>>>
> > > > > >>>>>> rollback
> > > > > >>>>>
> > > > > >>>>> is a encouraged.
> > > > > >>>>>>
> > > > > >>>>>> Kenn
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>> --
> > > > > >>> Jean-Baptiste Onofré
> > > > > >>> jbonofre@apache.org
> > > > > >>> http://blog.nanthrax.net
> > > > > >>> Talend - http://www.talend.com
> > > > > >>>
> > > > > >>>
> > > > > >>
> > > > > > --
> > > > > > Jean-Baptiste Onofré
> > > > > > jbonofre@apache.org
> > > > > > http://blog.nanthrax.net
> > > > > > Talend - http://www.talend.com
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: status update

Posted by Sandeep Deshmukh <sa...@datatorrent.com>.
Thanks Devor for detailed explanation of the thought process. That
certainly helps.

I do favor feature branches as they are must of major work. The proposed
development/feature branch for the time being is perfectly fine. Just a
caution that keeping the master updated could be taxing. We can certainly
revisit this later once the project is stable in Incubation.

re: timing of major releases
Beam being new and rapidly evolving, makes sense to favor growth right now.

Regards
Sandeep


On Fri, Feb 19, 2016 at 1:42 AM, Davor Bonaci <da...@google.com.invalid>
wrote:

> Thanks Sandeep for your insightful comments! I'll add a little more details
> to follow up on Frances' email to clarify what we were proposing.
>
> We were indeed proposing the concept of the development / feature branches
> (for the time being). I think the goal of having the master branch in a
> "working state" is a clear, non-contentions goal that equally benefits all
> users and all developers. "Working state" doesn't necessarily mean
> production-ready, but means that "most" things work out of the box. Not
> having this goal would be hugely taxing for everyone.
>
> Now, this goal can be achieved either via:
>
>    - direct commits to master with a proper presubmit coverage,
>    - direct commits to a separate branch(es) with regular integration to
>    the master, or
>    - a combination of these approaches on a per-pull request basis.
>
> I think everyone universally prefers the first choice. Google committers
> certainly do.
>
> However, we start this project from a point of zero presubmit coverage, and
> expectation of a ton of refactoring and breaking changes. We felt letting
> all these changes into master right away without proper coverage would make
> everyone's life a little sad. Here's a very simple example -- the vision we
> proposed involves completely changing the runner-facing APIs. Every commit
> of this work will break parts of our project -- such works needs to be
> separated, otherwise *everyone* gets blocked.
>
> [ Additionally, Beam is a somewhat unique project -- it makes sense with
> services, such as Dataflow Service, Flink or Spark, that are maintained as
> separate projects -- so, we need a decent cross-project integration
> coverage. ]
>
> With this in mind, our proposal is keep the development / feature branches
> *for the time being* -- until we replicate a portion of the presubmit
> coverage we have internally at Google. This is the "default" place for
> executing against the vision, cleaning runner API, etc. As Frances said,
> there are classes of commits, such as a new IO connector or a new
> domain-specific PTransform, where we can simplify this and go directly to
> master. In any case, master should never be behind the development branch
> by more than a week or so.
>
> --
>
> re: timing of major releases
> This is often a contentions topic. The other side of the argument is
> two-fold:
>
>    - Releasing rarely creates a bigger cliff. Hence, it becomes a higher
>    burden and less likely to happen.
>    - Releasing rarely optimizes experience for existing users, not new
>    users. If we are growing quickly, that may not be the best metric to
>    optimize for.
>
> In reality, we are likely to release more often as we mature. Later on, we
> are likely to release less often.
>
> I think decisions in this space should be delayed until the time we need to
> make a choice and consider the payload we want to release. That said,
> having some timeline in mind is often beneficial to make right design
> tradeoffs.
>
> --
>
> re: @evolving idea
> We are already doing this and planning to continue -- this works really
> well. See @Experimental here
> <
> https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/a8347d186700e83edb91a3c673200abec1b1d4e7/sdk/src/main/java/com/google/cloud/dataflow/sdk/annotations/Experimental.java
> >
> .
>
> On Thu, Feb 18, 2016 at 10:17 AM, Robert Bradshaw <
> robertwb@google.com.invalid> wrote:
>
> > Yes, (b) would be a so-called feature branch. (The proposal here is to
> > discard the idea of having a separate long-lived "develop" branch.)
> >
> > On Thu, Feb 18, 2016 at 10:08 AM, Frances Perry <fj...@google.com.invalid>
> > wrote:
> >
> > > Our developers are going to be a varied group -- so "main development"
> > will
> > > look quite different to different developers. In particular, look at:
> > > (a) a developer writing a java sdk extension for a new IO connector
> > > (b) a developer changing the beam model
> > >
> > > I think it's fine for work like (a) to occur on master, but I think
> > things
> > > like (b) should  happen on a development branch so that we can keep the
> > > master branch in a working state. There are going to be a number of
> > large,
> > > backwards incompatible, churn-y changes to Runner APIs in the near
> > future.
> > > I'd like us to be able to do those in a way that doesn't affect folks
> who
> > > are attempting more surface level contributions.
> > >
> > > Frances
> > >
> > > On Thu, Feb 18, 2016 at 8:07 AM, Robert Bradshaw <
> > > robertwb@google.com.invalid> wrote:
> > >
> > > > +1 to using master for main development (and most non-ASF projects
> use
> > > > master like this too). Not having master (the default when one
> clones,
> > > > etc.) be at HEAD is often surprising. Tags are easy enough to use
> when
> > > one
> > > > wants a stable version.
> > > >
> > > > - Robert
> > > >
> > > >
> > > > On Wed, Feb 17, 2016 at 11:38 PM, Jean-Baptiste Onofré <
> > jb@nanthrax.net>
> > > > wrote:
> > > >
> > > > > Thanks Henry, I remember now, and Frances posted the link.
> > > > >
> > > > > I agree: we should use the master branch as dev branch as all other
> > ASF
> > > > > projects do.
> > > > >
> > > > > Regards
> > > > > JB
> > > > >
> > > > >
> > > > > On 02/18/2016 08:04 AM, Henry Saputra wrote:
> > > > >
> > > > >> Actually no, it is a bit different.
> > > > >> The concept of develop branch is following the "successful git
> > > branching
> > > > >> model" blog post [1] that introduce using develop branch as active
> > > > branch
> > > > >> for development and use master as stable branch.
> > > > >>
> > > > >> I would recommend using master branch instead as default branch to
> > do
> > > > >> active development to match other ASF projects.
> > > > >>
> > > > >> Some projects using develop from origin company, like Twill [2],
> had
> > > > also
> > > > >> moved to using master as default active branch.
> > > > >>
> > > > >> Just my 2 cents.
> > > > >>
> > > > >> Thx.
> > > > >>
> > > > >> Henry
> > > > >>
> > > > >>
> > > > >> [1] http://nvie.com/posts/a-successful-git-branching-model/
> > > > >> [2] http://twill.incubator.apache.org/HowToContribute.html
> > > > >>
> > > > >> On Wed, Feb 17, 2016 at 10:52 PM, Jean-Baptiste Onofré <
> > > jb@nanthrax.net
> > > > >
> > > > >> wrote:
> > > > >>
> > > > >> Hi,
> > > > >>>
> > > > >>> Correct me if I'm wrong, but I'm assuming that develop == master
> > > (from
> > > > a
> > > > >>> git perspective).
> > > > >>>
> > > > >>> I configured Jenkins this way as it's the "regular" naming ;)
> > > > >>>
> > > > >>> I think Frances said "develop" from a dev perspective. All
> projects
> > > use
> > > > >>> master (it's what I'm doing in Falcon, Lens, Karaf, Camel, etc,
> > etc).
> > > > >>>
> > > > >>> Maybe I'm wrong ;)
> > > > >>>
> > > > >>> Regards
> > > > >>> JB
> > > > >>>
> > > > >>>
> > > > >>> On 02/18/2016 06:46 AM, Sandeep Deshmukh wrote:
> > > > >>>
> > > > >>> Hi All,
> > > > >>>>
> > > > >>>> I have some comments on the repository structure and most of
> them
> > > are
> > > > >>>> wrt
> > > > >>>> my experience in another Apache incubating project.
> > > > >>>>
> > > > >>>>
> > > > >>>>      1. Most active projects use *master* as default development
> > > > branch
> > > > >>>> than
> > > > >>>>      *develop*.  For example, Flink, Spark, Storm, Samza, Pig,
> > Hive,
> > > > and
> > > > >>>>      Hadoop use master branch.
> > > > >>>>      2. Released artifacts are always hosted on downloads
> > page.Maser
> > > > >>>> need
> > > > >>>> not
> > > > >>>>      be the one with production ready state.
> > > > >>>>      3. It is quite intuitive to use *master* otherwise new
> > > > contributors
> > > > >>>>      needs to go through documentation to understand process of
> > each
> > > > >>>> project.
> > > > >>>>      4. Overall, the process becomes simple if *master* is the
> > > default
> > > > >>>> branch.
> > > > >>>>
> > > > >>>>
> > > > >>>> Another suggestion is related to release with major version
> > change.
> > > > >>>> Major
> > > > >>>> release twice a year is a lot of burden on the end user if they
> > want
> > > > to
> > > > >>>> upgrade to a newer version. To address this issue, newly added
> > APIs
> > > > can
> > > > >>>> be
> > > > >>>> marked as @evolving so that users are aware of possible change
> in
> > > the
> > > > >>>> upcoming release but the stable one should be carefully changed.
> > > > >>>>
> > > > >>>> Regards,
> > > > >>>> Sandeep
> > > > >>>>
> > > > >>>> On Sat, Feb 13, 2016 at 2:34 AM, Frances Perry
> > > <fjp@google.com.invalid
> > > > >
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>> Thanks for all the feedback! Please keep it coming as needed.
> > > > >>>>
> > > > >>>>>
> > > > >>>>> We've gone ahead and created components matching this
> structure:
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/BEAM/?selectedTab=com.atlassian.jira.jira-projects-plugin:components-panel
> > > > >>>>>
> > > > >>>>> We'll work on transition existing state from Google-internal
> > tools
> > > > into
> > > > >>>>> this over the next few weeks.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On Fri, Feb 12, 2016 at 7:47 AM, Kenneth Knowles
> > > > >>>>> <klk@google.com.invalid
> > > > >>>>>
> > > > >>>>>>
> > > > >>>>>> wrote:
> > > > >>>>>
> > > > >>>>> On Thu, Feb 11, 2016 at 8:53 AM, Maximilian Michels <
> > > mxm@apache.org>
> > > > >>>>>
> > > > >>>>>> wrote:
> > > > >>>>>>
> > > > >>>>>> As for the /develop branch, I would suggest to
> > > > >>>>>>
> > > > >>>>>>> make it mandatory to have it in a usable state at all times.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> +1
> > > > >>>>>>
> > > > >>>>>> If breakage is accidentally committed (as will happen) then a
> > CTR
> > > > >>>>>>
> > > > >>>>>> rollback
> > > > >>>>>
> > > > >>>>> is a encouraged.
> > > > >>>>>>
> > > > >>>>>> Kenn
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>> --
> > > > >>> Jean-Baptiste Onofré
> > > > >>> jbonofre@apache.org
> > > > >>> http://blog.nanthrax.net
> > > > >>> Talend - http://www.talend.com
> > > > >>>
> > > > >>>
> > > > >>
> > > > > --
> > > > > Jean-Baptiste Onofré
> > > > > jbonofre@apache.org
> > > > > http://blog.nanthrax.net
> > > > > Talend - http://www.talend.com
> > > > >
> > > >
> > >
> >
>

Re: status update

Posted by Davor Bonaci <da...@google.com.INVALID>.
Thanks Sandeep for your insightful comments! I'll add a little more details
to follow up on Frances' email to clarify what we were proposing.

We were indeed proposing the concept of the development / feature branches
(for the time being). I think the goal of having the master branch in a
"working state" is a clear, non-contentions goal that equally benefits all
users and all developers. "Working state" doesn't necessarily mean
production-ready, but means that "most" things work out of the box. Not
having this goal would be hugely taxing for everyone.

Now, this goal can be achieved either via:

   - direct commits to master with a proper presubmit coverage,
   - direct commits to a separate branch(es) with regular integration to
   the master, or
   - a combination of these approaches on a per-pull request basis.

I think everyone universally prefers the first choice. Google committers
certainly do.

However, we start this project from a point of zero presubmit coverage, and
expectation of a ton of refactoring and breaking changes. We felt letting
all these changes into master right away without proper coverage would make
everyone's life a little sad. Here's a very simple example -- the vision we
proposed involves completely changing the runner-facing APIs. Every commit
of this work will break parts of our project -- such works needs to be
separated, otherwise *everyone* gets blocked.

[ Additionally, Beam is a somewhat unique project -- it makes sense with
services, such as Dataflow Service, Flink or Spark, that are maintained as
separate projects -- so, we need a decent cross-project integration
coverage. ]

With this in mind, our proposal is keep the development / feature branches
*for the time being* -- until we replicate a portion of the presubmit
coverage we have internally at Google. This is the "default" place for
executing against the vision, cleaning runner API, etc. As Frances said,
there are classes of commits, such as a new IO connector or a new
domain-specific PTransform, where we can simplify this and go directly to
master. In any case, master should never be behind the development branch
by more than a week or so.

--

re: timing of major releases
This is often a contentions topic. The other side of the argument is
two-fold:

   - Releasing rarely creates a bigger cliff. Hence, it becomes a higher
   burden and less likely to happen.
   - Releasing rarely optimizes experience for existing users, not new
   users. If we are growing quickly, that may not be the best metric to
   optimize for.

In reality, we are likely to release more often as we mature. Later on, we
are likely to release less often.

I think decisions in this space should be delayed until the time we need to
make a choice and consider the payload we want to release. That said,
having some timeline in mind is often beneficial to make right design
tradeoffs.

--

re: @evolving idea
We are already doing this and planning to continue -- this works really
well. See @Experimental here
<https://github.com/GoogleCloudPlatform/DataflowJavaSDK/blob/a8347d186700e83edb91a3c673200abec1b1d4e7/sdk/src/main/java/com/google/cloud/dataflow/sdk/annotations/Experimental.java>
.

On Thu, Feb 18, 2016 at 10:17 AM, Robert Bradshaw <
robertwb@google.com.invalid> wrote:

> Yes, (b) would be a so-called feature branch. (The proposal here is to
> discard the idea of having a separate long-lived "develop" branch.)
>
> On Thu, Feb 18, 2016 at 10:08 AM, Frances Perry <fj...@google.com.invalid>
> wrote:
>
> > Our developers are going to be a varied group -- so "main development"
> will
> > look quite different to different developers. In particular, look at:
> > (a) a developer writing a java sdk extension for a new IO connector
> > (b) a developer changing the beam model
> >
> > I think it's fine for work like (a) to occur on master, but I think
> things
> > like (b) should  happen on a development branch so that we can keep the
> > master branch in a working state. There are going to be a number of
> large,
> > backwards incompatible, churn-y changes to Runner APIs in the near
> future.
> > I'd like us to be able to do those in a way that doesn't affect folks who
> > are attempting more surface level contributions.
> >
> > Frances
> >
> > On Thu, Feb 18, 2016 at 8:07 AM, Robert Bradshaw <
> > robertwb@google.com.invalid> wrote:
> >
> > > +1 to using master for main development (and most non-ASF projects use
> > > master like this too). Not having master (the default when one clones,
> > > etc.) be at HEAD is often surprising. Tags are easy enough to use when
> > one
> > > wants a stable version.
> > >
> > > - Robert
> > >
> > >
> > > On Wed, Feb 17, 2016 at 11:38 PM, Jean-Baptiste Onofré <
> jb@nanthrax.net>
> > > wrote:
> > >
> > > > Thanks Henry, I remember now, and Frances posted the link.
> > > >
> > > > I agree: we should use the master branch as dev branch as all other
> ASF
> > > > projects do.
> > > >
> > > > Regards
> > > > JB
> > > >
> > > >
> > > > On 02/18/2016 08:04 AM, Henry Saputra wrote:
> > > >
> > > >> Actually no, it is a bit different.
> > > >> The concept of develop branch is following the "successful git
> > branching
> > > >> model" blog post [1] that introduce using develop branch as active
> > > branch
> > > >> for development and use master as stable branch.
> > > >>
> > > >> I would recommend using master branch instead as default branch to
> do
> > > >> active development to match other ASF projects.
> > > >>
> > > >> Some projects using develop from origin company, like Twill [2], had
> > > also
> > > >> moved to using master as default active branch.
> > > >>
> > > >> Just my 2 cents.
> > > >>
> > > >> Thx.
> > > >>
> > > >> Henry
> > > >>
> > > >>
> > > >> [1] http://nvie.com/posts/a-successful-git-branching-model/
> > > >> [2] http://twill.incubator.apache.org/HowToContribute.html
> > > >>
> > > >> On Wed, Feb 17, 2016 at 10:52 PM, Jean-Baptiste Onofré <
> > jb@nanthrax.net
> > > >
> > > >> wrote:
> > > >>
> > > >> Hi,
> > > >>>
> > > >>> Correct me if I'm wrong, but I'm assuming that develop == master
> > (from
> > > a
> > > >>> git perspective).
> > > >>>
> > > >>> I configured Jenkins this way as it's the "regular" naming ;)
> > > >>>
> > > >>> I think Frances said "develop" from a dev perspective. All projects
> > use
> > > >>> master (it's what I'm doing in Falcon, Lens, Karaf, Camel, etc,
> etc).
> > > >>>
> > > >>> Maybe I'm wrong ;)
> > > >>>
> > > >>> Regards
> > > >>> JB
> > > >>>
> > > >>>
> > > >>> On 02/18/2016 06:46 AM, Sandeep Deshmukh wrote:
> > > >>>
> > > >>> Hi All,
> > > >>>>
> > > >>>> I have some comments on the repository structure and most of them
> > are
> > > >>>> wrt
> > > >>>> my experience in another Apache incubating project.
> > > >>>>
> > > >>>>
> > > >>>>      1. Most active projects use *master* as default development
> > > branch
> > > >>>> than
> > > >>>>      *develop*.  For example, Flink, Spark, Storm, Samza, Pig,
> Hive,
> > > and
> > > >>>>      Hadoop use master branch.
> > > >>>>      2. Released artifacts are always hosted on downloads
> page.Maser
> > > >>>> need
> > > >>>> not
> > > >>>>      be the one with production ready state.
> > > >>>>      3. It is quite intuitive to use *master* otherwise new
> > > contributors
> > > >>>>      needs to go through documentation to understand process of
> each
> > > >>>> project.
> > > >>>>      4. Overall, the process becomes simple if *master* is the
> > default
> > > >>>> branch.
> > > >>>>
> > > >>>>
> > > >>>> Another suggestion is related to release with major version
> change.
> > > >>>> Major
> > > >>>> release twice a year is a lot of burden on the end user if they
> want
> > > to
> > > >>>> upgrade to a newer version. To address this issue, newly added
> APIs
> > > can
> > > >>>> be
> > > >>>> marked as @evolving so that users are aware of possible change in
> > the
> > > >>>> upcoming release but the stable one should be carefully changed.
> > > >>>>
> > > >>>> Regards,
> > > >>>> Sandeep
> > > >>>>
> > > >>>> On Sat, Feb 13, 2016 at 2:34 AM, Frances Perry
> > <fjp@google.com.invalid
> > > >
> > > >>>> wrote:
> > > >>>>
> > > >>>> Thanks for all the feedback! Please keep it coming as needed.
> > > >>>>
> > > >>>>>
> > > >>>>> We've gone ahead and created components matching this structure:
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > >
> >
> https://issues.apache.org/jira/browse/BEAM/?selectedTab=com.atlassian.jira.jira-projects-plugin:components-panel
> > > >>>>>
> > > >>>>> We'll work on transition existing state from Google-internal
> tools
> > > into
> > > >>>>> this over the next few weeks.
> > > >>>>>
> > > >>>>>
> > > >>>>> On Fri, Feb 12, 2016 at 7:47 AM, Kenneth Knowles
> > > >>>>> <klk@google.com.invalid
> > > >>>>>
> > > >>>>>>
> > > >>>>>> wrote:
> > > >>>>>
> > > >>>>> On Thu, Feb 11, 2016 at 8:53 AM, Maximilian Michels <
> > mxm@apache.org>
> > > >>>>>
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>> As for the /develop branch, I would suggest to
> > > >>>>>>
> > > >>>>>>> make it mandatory to have it in a usable state at all times.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> +1
> > > >>>>>>
> > > >>>>>> If breakage is accidentally committed (as will happen) then a
> CTR
> > > >>>>>>
> > > >>>>>> rollback
> > > >>>>>
> > > >>>>> is a encouraged.
> > > >>>>>>
> > > >>>>>> Kenn
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>> --
> > > >>> Jean-Baptiste Onofré
> > > >>> jbonofre@apache.org
> > > >>> http://blog.nanthrax.net
> > > >>> Talend - http://www.talend.com
> > > >>>
> > > >>>
> > > >>
> > > > --
> > > > Jean-Baptiste Onofré
> > > > jbonofre@apache.org
> > > > http://blog.nanthrax.net
> > > > Talend - http://www.talend.com
> > > >
> > >
> >
>

Re: status update

Posted by Robert Bradshaw <ro...@google.com.INVALID>.
Yes, (b) would be a so-called feature branch. (The proposal here is to
discard the idea of having a separate long-lived "develop" branch.)

On Thu, Feb 18, 2016 at 10:08 AM, Frances Perry <fj...@google.com.invalid>
wrote:

> Our developers are going to be a varied group -- so "main development" will
> look quite different to different developers. In particular, look at:
> (a) a developer writing a java sdk extension for a new IO connector
> (b) a developer changing the beam model
>
> I think it's fine for work like (a) to occur on master, but I think things
> like (b) should  happen on a development branch so that we can keep the
> master branch in a working state. There are going to be a number of large,
> backwards incompatible, churn-y changes to Runner APIs in the near future.
> I'd like us to be able to do those in a way that doesn't affect folks who
> are attempting more surface level contributions.
>
> Frances
>
> On Thu, Feb 18, 2016 at 8:07 AM, Robert Bradshaw <
> robertwb@google.com.invalid> wrote:
>
> > +1 to using master for main development (and most non-ASF projects use
> > master like this too). Not having master (the default when one clones,
> > etc.) be at HEAD is often surprising. Tags are easy enough to use when
> one
> > wants a stable version.
> >
> > - Robert
> >
> >
> > On Wed, Feb 17, 2016 at 11:38 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> > wrote:
> >
> > > Thanks Henry, I remember now, and Frances posted the link.
> > >
> > > I agree: we should use the master branch as dev branch as all other ASF
> > > projects do.
> > >
> > > Regards
> > > JB
> > >
> > >
> > > On 02/18/2016 08:04 AM, Henry Saputra wrote:
> > >
> > >> Actually no, it is a bit different.
> > >> The concept of develop branch is following the "successful git
> branching
> > >> model" blog post [1] that introduce using develop branch as active
> > branch
> > >> for development and use master as stable branch.
> > >>
> > >> I would recommend using master branch instead as default branch to do
> > >> active development to match other ASF projects.
> > >>
> > >> Some projects using develop from origin company, like Twill [2], had
> > also
> > >> moved to using master as default active branch.
> > >>
> > >> Just my 2 cents.
> > >>
> > >> Thx.
> > >>
> > >> Henry
> > >>
> > >>
> > >> [1] http://nvie.com/posts/a-successful-git-branching-model/
> > >> [2] http://twill.incubator.apache.org/HowToContribute.html
> > >>
> > >> On Wed, Feb 17, 2016 at 10:52 PM, Jean-Baptiste Onofré <
> jb@nanthrax.net
> > >
> > >> wrote:
> > >>
> > >> Hi,
> > >>>
> > >>> Correct me if I'm wrong, but I'm assuming that develop == master
> (from
> > a
> > >>> git perspective).
> > >>>
> > >>> I configured Jenkins this way as it's the "regular" naming ;)
> > >>>
> > >>> I think Frances said "develop" from a dev perspective. All projects
> use
> > >>> master (it's what I'm doing in Falcon, Lens, Karaf, Camel, etc, etc).
> > >>>
> > >>> Maybe I'm wrong ;)
> > >>>
> > >>> Regards
> > >>> JB
> > >>>
> > >>>
> > >>> On 02/18/2016 06:46 AM, Sandeep Deshmukh wrote:
> > >>>
> > >>> Hi All,
> > >>>>
> > >>>> I have some comments on the repository structure and most of them
> are
> > >>>> wrt
> > >>>> my experience in another Apache incubating project.
> > >>>>
> > >>>>
> > >>>>      1. Most active projects use *master* as default development
> > branch
> > >>>> than
> > >>>>      *develop*.  For example, Flink, Spark, Storm, Samza, Pig, Hive,
> > and
> > >>>>      Hadoop use master branch.
> > >>>>      2. Released artifacts are always hosted on downloads page.Maser
> > >>>> need
> > >>>> not
> > >>>>      be the one with production ready state.
> > >>>>      3. It is quite intuitive to use *master* otherwise new
> > contributors
> > >>>>      needs to go through documentation to understand process of each
> > >>>> project.
> > >>>>      4. Overall, the process becomes simple if *master* is the
> default
> > >>>> branch.
> > >>>>
> > >>>>
> > >>>> Another suggestion is related to release with major version change.
> > >>>> Major
> > >>>> release twice a year is a lot of burden on the end user if they want
> > to
> > >>>> upgrade to a newer version. To address this issue, newly added APIs
> > can
> > >>>> be
> > >>>> marked as @evolving so that users are aware of possible change in
> the
> > >>>> upcoming release but the stable one should be carefully changed.
> > >>>>
> > >>>> Regards,
> > >>>> Sandeep
> > >>>>
> > >>>> On Sat, Feb 13, 2016 at 2:34 AM, Frances Perry
> <fjp@google.com.invalid
> > >
> > >>>> wrote:
> > >>>>
> > >>>> Thanks for all the feedback! Please keep it coming as needed.
> > >>>>
> > >>>>>
> > >>>>> We've gone ahead and created components matching this structure:
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> >
> https://issues.apache.org/jira/browse/BEAM/?selectedTab=com.atlassian.jira.jira-projects-plugin:components-panel
> > >>>>>
> > >>>>> We'll work on transition existing state from Google-internal tools
> > into
> > >>>>> this over the next few weeks.
> > >>>>>
> > >>>>>
> > >>>>> On Fri, Feb 12, 2016 at 7:47 AM, Kenneth Knowles
> > >>>>> <klk@google.com.invalid
> > >>>>>
> > >>>>>>
> > >>>>>> wrote:
> > >>>>>
> > >>>>> On Thu, Feb 11, 2016 at 8:53 AM, Maximilian Michels <
> mxm@apache.org>
> > >>>>>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>> As for the /develop branch, I would suggest to
> > >>>>>>
> > >>>>>>> make it mandatory to have it in a usable state at all times.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> +1
> > >>>>>>
> > >>>>>> If breakage is accidentally committed (as will happen) then a CTR
> > >>>>>>
> > >>>>>> rollback
> > >>>>>
> > >>>>> is a encouraged.
> > >>>>>>
> > >>>>>> Kenn
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>> --
> > >>> Jean-Baptiste Onofré
> > >>> jbonofre@apache.org
> > >>> http://blog.nanthrax.net
> > >>> Talend - http://www.talend.com
> > >>>
> > >>>
> > >>
> > > --
> > > Jean-Baptiste Onofré
> > > jbonofre@apache.org
> > > http://blog.nanthrax.net
> > > Talend - http://www.talend.com
> > >
> >
>

Re: status update

Posted by Frances Perry <fj...@google.com.INVALID>.
Our developers are going to be a varied group -- so "main development" will
look quite different to different developers. In particular, look at:
(a) a developer writing a java sdk extension for a new IO connector
(b) a developer changing the beam model

I think it's fine for work like (a) to occur on master, but I think things
like (b) should  happen on a development branch so that we can keep the
master branch in a working state. There are going to be a number of large,
backwards incompatible, churn-y changes to Runner APIs in the near future.
I'd like us to be able to do those in a way that doesn't affect folks who
are attempting more surface level contributions.

Frances

On Thu, Feb 18, 2016 at 8:07 AM, Robert Bradshaw <
robertwb@google.com.invalid> wrote:

> +1 to using master for main development (and most non-ASF projects use
> master like this too). Not having master (the default when one clones,
> etc.) be at HEAD is often surprising. Tags are easy enough to use when one
> wants a stable version.
>
> - Robert
>
>
> On Wed, Feb 17, 2016 at 11:38 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
> > Thanks Henry, I remember now, and Frances posted the link.
> >
> > I agree: we should use the master branch as dev branch as all other ASF
> > projects do.
> >
> > Regards
> > JB
> >
> >
> > On 02/18/2016 08:04 AM, Henry Saputra wrote:
> >
> >> Actually no, it is a bit different.
> >> The concept of develop branch is following the "successful git branching
> >> model" blog post [1] that introduce using develop branch as active
> branch
> >> for development and use master as stable branch.
> >>
> >> I would recommend using master branch instead as default branch to do
> >> active development to match other ASF projects.
> >>
> >> Some projects using develop from origin company, like Twill [2], had
> also
> >> moved to using master as default active branch.
> >>
> >> Just my 2 cents.
> >>
> >> Thx.
> >>
> >> Henry
> >>
> >>
> >> [1] http://nvie.com/posts/a-successful-git-branching-model/
> >> [2] http://twill.incubator.apache.org/HowToContribute.html
> >>
> >> On Wed, Feb 17, 2016 at 10:52 PM, Jean-Baptiste Onofré <jb@nanthrax.net
> >
> >> wrote:
> >>
> >> Hi,
> >>>
> >>> Correct me if I'm wrong, but I'm assuming that develop == master (from
> a
> >>> git perspective).
> >>>
> >>> I configured Jenkins this way as it's the "regular" naming ;)
> >>>
> >>> I think Frances said "develop" from a dev perspective. All projects use
> >>> master (it's what I'm doing in Falcon, Lens, Karaf, Camel, etc, etc).
> >>>
> >>> Maybe I'm wrong ;)
> >>>
> >>> Regards
> >>> JB
> >>>
> >>>
> >>> On 02/18/2016 06:46 AM, Sandeep Deshmukh wrote:
> >>>
> >>> Hi All,
> >>>>
> >>>> I have some comments on the repository structure and most of them are
> >>>> wrt
> >>>> my experience in another Apache incubating project.
> >>>>
> >>>>
> >>>>      1. Most active projects use *master* as default development
> branch
> >>>> than
> >>>>      *develop*.  For example, Flink, Spark, Storm, Samza, Pig, Hive,
> and
> >>>>      Hadoop use master branch.
> >>>>      2. Released artifacts are always hosted on downloads page.Maser
> >>>> need
> >>>> not
> >>>>      be the one with production ready state.
> >>>>      3. It is quite intuitive to use *master* otherwise new
> contributors
> >>>>      needs to go through documentation to understand process of each
> >>>> project.
> >>>>      4. Overall, the process becomes simple if *master* is the default
> >>>> branch.
> >>>>
> >>>>
> >>>> Another suggestion is related to release with major version change.
> >>>> Major
> >>>> release twice a year is a lot of burden on the end user if they want
> to
> >>>> upgrade to a newer version. To address this issue, newly added APIs
> can
> >>>> be
> >>>> marked as @evolving so that users are aware of possible change in the
> >>>> upcoming release but the stable one should be carefully changed.
> >>>>
> >>>> Regards,
> >>>> Sandeep
> >>>>
> >>>> On Sat, Feb 13, 2016 at 2:34 AM, Frances Perry <fjp@google.com.invalid
> >
> >>>> wrote:
> >>>>
> >>>> Thanks for all the feedback! Please keep it coming as needed.
> >>>>
> >>>>>
> >>>>> We've gone ahead and created components matching this structure:
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> https://issues.apache.org/jira/browse/BEAM/?selectedTab=com.atlassian.jira.jira-projects-plugin:components-panel
> >>>>>
> >>>>> We'll work on transition existing state from Google-internal tools
> into
> >>>>> this over the next few weeks.
> >>>>>
> >>>>>
> >>>>> On Fri, Feb 12, 2016 at 7:47 AM, Kenneth Knowles
> >>>>> <klk@google.com.invalid
> >>>>>
> >>>>>>
> >>>>>> wrote:
> >>>>>
> >>>>> On Thu, Feb 11, 2016 at 8:53 AM, Maximilian Michels <mx...@apache.org>
> >>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>> As for the /develop branch, I would suggest to
> >>>>>>
> >>>>>>> make it mandatory to have it in a usable state at all times.
> >>>>>>>
> >>>>>>>
> >>>>>>> +1
> >>>>>>
> >>>>>> If breakage is accidentally committed (as will happen) then a CTR
> >>>>>>
> >>>>>> rollback
> >>>>>
> >>>>> is a encouraged.
> >>>>>>
> >>>>>> Kenn
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>> --
> >>> Jean-Baptiste Onofré
> >>> jbonofre@apache.org
> >>> http://blog.nanthrax.net
> >>> Talend - http://www.talend.com
> >>>
> >>>
> >>
> > --
> > Jean-Baptiste Onofré
> > jbonofre@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>

Re: status update

Posted by Robert Bradshaw <ro...@google.com.INVALID>.
+1 to using master for main development (and most non-ASF projects use
master like this too). Not having master (the default when one clones,
etc.) be at HEAD is often surprising. Tags are easy enough to use when one
wants a stable version.

- Robert


On Wed, Feb 17, 2016 at 11:38 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Thanks Henry, I remember now, and Frances posted the link.
>
> I agree: we should use the master branch as dev branch as all other ASF
> projects do.
>
> Regards
> JB
>
>
> On 02/18/2016 08:04 AM, Henry Saputra wrote:
>
>> Actually no, it is a bit different.
>> The concept of develop branch is following the "successful git branching
>> model" blog post [1] that introduce using develop branch as active branch
>> for development and use master as stable branch.
>>
>> I would recommend using master branch instead as default branch to do
>> active development to match other ASF projects.
>>
>> Some projects using develop from origin company, like Twill [2], had also
>> moved to using master as default active branch.
>>
>> Just my 2 cents.
>>
>> Thx.
>>
>> Henry
>>
>>
>> [1] http://nvie.com/posts/a-successful-git-branching-model/
>> [2] http://twill.incubator.apache.org/HowToContribute.html
>>
>> On Wed, Feb 17, 2016 at 10:52 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>> wrote:
>>
>> Hi,
>>>
>>> Correct me if I'm wrong, but I'm assuming that develop == master (from a
>>> git perspective).
>>>
>>> I configured Jenkins this way as it's the "regular" naming ;)
>>>
>>> I think Frances said "develop" from a dev perspective. All projects use
>>> master (it's what I'm doing in Falcon, Lens, Karaf, Camel, etc, etc).
>>>
>>> Maybe I'm wrong ;)
>>>
>>> Regards
>>> JB
>>>
>>>
>>> On 02/18/2016 06:46 AM, Sandeep Deshmukh wrote:
>>>
>>> Hi All,
>>>>
>>>> I have some comments on the repository structure and most of them are
>>>> wrt
>>>> my experience in another Apache incubating project.
>>>>
>>>>
>>>>      1. Most active projects use *master* as default development branch
>>>> than
>>>>      *develop*.  For example, Flink, Spark, Storm, Samza, Pig, Hive, and
>>>>      Hadoop use master branch.
>>>>      2. Released artifacts are always hosted on downloads page.Maser
>>>> need
>>>> not
>>>>      be the one with production ready state.
>>>>      3. It is quite intuitive to use *master* otherwise new contributors
>>>>      needs to go through documentation to understand process of each
>>>> project.
>>>>      4. Overall, the process becomes simple if *master* is the default
>>>> branch.
>>>>
>>>>
>>>> Another suggestion is related to release with major version change.
>>>> Major
>>>> release twice a year is a lot of burden on the end user if they want to
>>>> upgrade to a newer version. To address this issue, newly added APIs can
>>>> be
>>>> marked as @evolving so that users are aware of possible change in the
>>>> upcoming release but the stable one should be carefully changed.
>>>>
>>>> Regards,
>>>> Sandeep
>>>>
>>>> On Sat, Feb 13, 2016 at 2:34 AM, Frances Perry <fj...@google.com.invalid>
>>>> wrote:
>>>>
>>>> Thanks for all the feedback! Please keep it coming as needed.
>>>>
>>>>>
>>>>> We've gone ahead and created components matching this structure:
>>>>>
>>>>>
>>>>>
>>>>> https://issues.apache.org/jira/browse/BEAM/?selectedTab=com.atlassian.jira.jira-projects-plugin:components-panel
>>>>>
>>>>> We'll work on transition existing state from Google-internal tools into
>>>>> this over the next few weeks.
>>>>>
>>>>>
>>>>> On Fri, Feb 12, 2016 at 7:47 AM, Kenneth Knowles
>>>>> <klk@google.com.invalid
>>>>>
>>>>>>
>>>>>> wrote:
>>>>>
>>>>> On Thu, Feb 11, 2016 at 8:53 AM, Maximilian Michels <mx...@apache.org>
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>> As for the /develop branch, I would suggest to
>>>>>>
>>>>>>> make it mandatory to have it in a usable state at all times.
>>>>>>>
>>>>>>>
>>>>>>> +1
>>>>>>
>>>>>> If breakage is accidentally committed (as will happen) then a CTR
>>>>>>
>>>>>> rollback
>>>>>
>>>>> is a encouraged.
>>>>>>
>>>>>> Kenn
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>> --
>>> Jean-Baptiste Onofré
>>> jbonofre@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: status update

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Thanks Henry, I remember now, and Frances posted the link.

I agree: we should use the master branch as dev branch as all other ASF 
projects do.

Regards
JB

On 02/18/2016 08:04 AM, Henry Saputra wrote:
> Actually no, it is a bit different.
> The concept of develop branch is following the "successful git branching
> model" blog post [1] that introduce using develop branch as active branch
> for development and use master as stable branch.
>
> I would recommend using master branch instead as default branch to do
> active development to match other ASF projects.
>
> Some projects using develop from origin company, like Twill [2], had also
> moved to using master as default active branch.
>
> Just my 2 cents.
>
> Thx.
>
> Henry
>
>
> [1] http://nvie.com/posts/a-successful-git-branching-model/
> [2] http://twill.incubator.apache.org/HowToContribute.html
>
> On Wed, Feb 17, 2016 at 10:52 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
>> Hi,
>>
>> Correct me if I'm wrong, but I'm assuming that develop == master (from a
>> git perspective).
>>
>> I configured Jenkins this way as it's the "regular" naming ;)
>>
>> I think Frances said "develop" from a dev perspective. All projects use
>> master (it's what I'm doing in Falcon, Lens, Karaf, Camel, etc, etc).
>>
>> Maybe I'm wrong ;)
>>
>> Regards
>> JB
>>
>>
>> On 02/18/2016 06:46 AM, Sandeep Deshmukh wrote:
>>
>>> Hi All,
>>>
>>> I have some comments on the repository structure and most of them are wrt
>>> my experience in another Apache incubating project.
>>>
>>>
>>>      1. Most active projects use *master* as default development branch
>>> than
>>>      *develop*.  For example, Flink, Spark, Storm, Samza, Pig, Hive, and
>>>      Hadoop use master branch.
>>>      2. Released artifacts are always hosted on downloads page.Maser need
>>> not
>>>      be the one with production ready state.
>>>      3. It is quite intuitive to use *master* otherwise new contributors
>>>      needs to go through documentation to understand process of each
>>> project.
>>>      4. Overall, the process becomes simple if *master* is the default
>>> branch.
>>>
>>>
>>> Another suggestion is related to release with major version change. Major
>>> release twice a year is a lot of burden on the end user if they want to
>>> upgrade to a newer version. To address this issue, newly added APIs can be
>>> marked as @evolving so that users are aware of possible change in the
>>> upcoming release but the stable one should be carefully changed.
>>>
>>> Regards,
>>> Sandeep
>>>
>>> On Sat, Feb 13, 2016 at 2:34 AM, Frances Perry <fj...@google.com.invalid>
>>> wrote:
>>>
>>> Thanks for all the feedback! Please keep it coming as needed.
>>>>
>>>> We've gone ahead and created components matching this structure:
>>>>
>>>>
>>>> https://issues.apache.org/jira/browse/BEAM/?selectedTab=com.atlassian.jira.jira-projects-plugin:components-panel
>>>>
>>>> We'll work on transition existing state from Google-internal tools into
>>>> this over the next few weeks.
>>>>
>>>>
>>>> On Fri, Feb 12, 2016 at 7:47 AM, Kenneth Knowles <klk@google.com.invalid
>>>>>
>>>> wrote:
>>>>
>>>> On Thu, Feb 11, 2016 at 8:53 AM, Maximilian Michels <mx...@apache.org>
>>>>> wrote:
>>>>>
>>>>> As for the /develop branch, I would suggest to
>>>>>> make it mandatory to have it in a usable state at all times.
>>>>>>
>>>>>>
>>>>> +1
>>>>>
>>>>> If breakage is accidentally committed (as will happen) then a CTR
>>>>>
>>>> rollback
>>>>
>>>>> is a encouraged.
>>>>>
>>>>> Kenn
>>>>>
>>>>>
>>>>
>>>
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: status update

Posted by Henry Saputra <he...@gmail.com>.
Actually no, it is a bit different.
The concept of develop branch is following the "successful git branching
model" blog post [1] that introduce using develop branch as active branch
for development and use master as stable branch.

I would recommend using master branch instead as default branch to do
active development to match other ASF projects.

Some projects using develop from origin company, like Twill [2], had also
moved to using master as default active branch.

Just my 2 cents.

Thx.

Henry


[1] http://nvie.com/posts/a-successful-git-branching-model/
[2] http://twill.incubator.apache.org/HowToContribute.html

On Wed, Feb 17, 2016 at 10:52 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Hi,
>
> Correct me if I'm wrong, but I'm assuming that develop == master (from a
> git perspective).
>
> I configured Jenkins this way as it's the "regular" naming ;)
>
> I think Frances said "develop" from a dev perspective. All projects use
> master (it's what I'm doing in Falcon, Lens, Karaf, Camel, etc, etc).
>
> Maybe I'm wrong ;)
>
> Regards
> JB
>
>
> On 02/18/2016 06:46 AM, Sandeep Deshmukh wrote:
>
>> Hi All,
>>
>> I have some comments on the repository structure and most of them are wrt
>> my experience in another Apache incubating project.
>>
>>
>>     1. Most active projects use *master* as default development branch
>> than
>>     *develop*.  For example, Flink, Spark, Storm, Samza, Pig, Hive, and
>>     Hadoop use master branch.
>>     2. Released artifacts are always hosted on downloads page.Maser need
>> not
>>     be the one with production ready state.
>>     3. It is quite intuitive to use *master* otherwise new contributors
>>     needs to go through documentation to understand process of each
>> project.
>>     4. Overall, the process becomes simple if *master* is the default
>> branch.
>>
>>
>> Another suggestion is related to release with major version change. Major
>> release twice a year is a lot of burden on the end user if they want to
>> upgrade to a newer version. To address this issue, newly added APIs can be
>> marked as @evolving so that users are aware of possible change in the
>> upcoming release but the stable one should be carefully changed.
>>
>> Regards,
>> Sandeep
>>
>> On Sat, Feb 13, 2016 at 2:34 AM, Frances Perry <fj...@google.com.invalid>
>> wrote:
>>
>> Thanks for all the feedback! Please keep it coming as needed.
>>>
>>> We've gone ahead and created components matching this structure:
>>>
>>>
>>> https://issues.apache.org/jira/browse/BEAM/?selectedTab=com.atlassian.jira.jira-projects-plugin:components-panel
>>>
>>> We'll work on transition existing state from Google-internal tools into
>>> this over the next few weeks.
>>>
>>>
>>> On Fri, Feb 12, 2016 at 7:47 AM, Kenneth Knowles <klk@google.com.invalid
>>> >
>>> wrote:
>>>
>>> On Thu, Feb 11, 2016 at 8:53 AM, Maximilian Michels <mx...@apache.org>
>>>> wrote:
>>>>
>>>> As for the /develop branch, I would suggest to
>>>>> make it mandatory to have it in a usable state at all times.
>>>>>
>>>>>
>>>> +1
>>>>
>>>> If breakage is accidentally committed (as will happen) then a CTR
>>>>
>>> rollback
>>>
>>>> is a encouraged.
>>>>
>>>> Kenn
>>>>
>>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: status update

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi,

Correct me if I'm wrong, but I'm assuming that develop == master (from a 
git perspective).

I configured Jenkins this way as it's the "regular" naming ;)

I think Frances said "develop" from a dev perspective. All projects use 
master (it's what I'm doing in Falcon, Lens, Karaf, Camel, etc, etc).

Maybe I'm wrong ;)

Regards
JB

On 02/18/2016 06:46 AM, Sandeep Deshmukh wrote:
> Hi All,
>
> I have some comments on the repository structure and most of them are wrt
> my experience in another Apache incubating project.
>
>
>     1. Most active projects use *master* as default development branch than
>     *develop*.  For example, Flink, Spark, Storm, Samza, Pig, Hive, and
>     Hadoop use master branch.
>     2. Released artifacts are always hosted on downloads page.Maser need not
>     be the one with production ready state.
>     3. It is quite intuitive to use *master* otherwise new contributors
>     needs to go through documentation to understand process of each project.
>     4. Overall, the process becomes simple if *master* is the default branch.
>
>
> Another suggestion is related to release with major version change. Major
> release twice a year is a lot of burden on the end user if they want to
> upgrade to a newer version. To address this issue, newly added APIs can be
> marked as @evolving so that users are aware of possible change in the
> upcoming release but the stable one should be carefully changed.
>
> Regards,
> Sandeep
>
> On Sat, Feb 13, 2016 at 2:34 AM, Frances Perry <fj...@google.com.invalid>
> wrote:
>
>> Thanks for all the feedback! Please keep it coming as needed.
>>
>> We've gone ahead and created components matching this structure:
>>
>> https://issues.apache.org/jira/browse/BEAM/?selectedTab=com.atlassian.jira.jira-projects-plugin:components-panel
>>
>> We'll work on transition existing state from Google-internal tools into
>> this over the next few weeks.
>>
>>
>> On Fri, Feb 12, 2016 at 7:47 AM, Kenneth Knowles <kl...@google.com.invalid>
>> wrote:
>>
>>> On Thu, Feb 11, 2016 at 8:53 AM, Maximilian Michels <mx...@apache.org>
>>> wrote:
>>>
>>>> As for the /develop branch, I would suggest to
>>>> make it mandatory to have it in a usable state at all times.
>>>>
>>>
>>> +1
>>>
>>> If breakage is accidentally committed (as will happen) then a CTR
>> rollback
>>> is a encouraged.
>>>
>>> Kenn
>>>
>>
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: status update

Posted by Sandeep Deshmukh <sa...@datatorrent.com>.
Hi All,

I have some comments on the repository structure and most of them are wrt
my experience in another Apache incubating project.


   1. Most active projects use *master* as default development branch than
   *develop*.  For example, Flink, Spark, Storm, Samza, Pig, Hive, and
   Hadoop use master branch.
   2. Released artifacts are always hosted on downloads page.Maser need not
   be the one with production ready state.
   3. It is quite intuitive to use *master* otherwise new contributors
   needs to go through documentation to understand process of each project.
   4. Overall, the process becomes simple if *master* is the default branch.


Another suggestion is related to release with major version change. Major
release twice a year is a lot of burden on the end user if they want to
upgrade to a newer version. To address this issue, newly added APIs can be
marked as @evolving so that users are aware of possible change in the
upcoming release but the stable one should be carefully changed.

Regards,
Sandeep

On Sat, Feb 13, 2016 at 2:34 AM, Frances Perry <fj...@google.com.invalid>
wrote:

> Thanks for all the feedback! Please keep it coming as needed.
>
> We've gone ahead and created components matching this structure:
>
> https://issues.apache.org/jira/browse/BEAM/?selectedTab=com.atlassian.jira.jira-projects-plugin:components-panel
>
> We'll work on transition existing state from Google-internal tools into
> this over the next few weeks.
>
>
> On Fri, Feb 12, 2016 at 7:47 AM, Kenneth Knowles <kl...@google.com.invalid>
> wrote:
>
> > On Thu, Feb 11, 2016 at 8:53 AM, Maximilian Michels <mx...@apache.org>
> > wrote:
> >
> > > As for the /develop branch, I would suggest to
> > > make it mandatory to have it in a usable state at all times.
> > >
> >
> > +1
> >
> > If breakage is accidentally committed (as will happen) then a CTR
> rollback
> > is a encouraged.
> >
> > Kenn
> >
>

Re: status update

Posted by Frances Perry <fj...@google.com.INVALID>.
Thanks for all the feedback! Please keep it coming as needed.

We've gone ahead and created components matching this structure:
https://issues.apache.org/jira/browse/BEAM/?selectedTab=com.atlassian.jira.jira-projects-plugin:components-panel

We'll work on transition existing state from Google-internal tools into
this over the next few weeks.


On Fri, Feb 12, 2016 at 7:47 AM, Kenneth Knowles <kl...@google.com.invalid>
wrote:

> On Thu, Feb 11, 2016 at 8:53 AM, Maximilian Michels <mx...@apache.org>
> wrote:
>
> > As for the /develop branch, I would suggest to
> > make it mandatory to have it in a usable state at all times.
> >
>
> +1
>
> If breakage is accidentally committed (as will happen) then a CTR rollback
> is a encouraged.
>
> Kenn
>

Re: status update

Posted by Kenneth Knowles <kl...@google.com.INVALID>.
On Thu, Feb 11, 2016 at 8:53 AM, Maximilian Michels <mx...@apache.org> wrote:

> As for the /develop branch, I would suggest to
> make it mandatory to have it in a usable state at all times.
>

+1

If breakage is accidentally committed (as will happen) then a CTR rollback
is a encouraged.

Kenn

Re: status update

Posted by Maximilian Michels <mx...@apache.org>.
Hi Frances,

Thank you for the documents. The structure of the repository looks
good. I wonder if "core" could even be divided further, e.g. in API
and runtime related modules. For the CI, we could checkout Apache's
Jenkins or Travis CI. As for the /develop branch, I would suggest to
make it mandatory to have it in a usable state at all times.

Considering the rework of the Dataflow SDK, it would be great if you
could give some status updates while you're doing that. This could
help us to prepare the runners to any major breaking changes
(particularly thinking about the Flink runner because I've been
working on it).

Best,
Max

On Tue, Feb 9, 2016 at 5:01 PM, Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
> Hi Frances,
>
> and thanks for the update.
>
> The repository structure looks good to me.
>
> Maybe we can add a section about the PR workflow (PR/review/push). WDYT ?
>
> For the Jira, no problem. I will add some tasks as well related to the
> roadmap (especially the DSLs, new IO, and DataIntegration part).
>
> Thanks !
> Regards
> JB
>
>
> On 02/09/2016 04:46 PM, Frances Perry wrote:
>>
>> Hi Beamers!
>>
>> Here’s the Apache Beam: Technical Vision
>>
>> <https://docs.google.com/document/d/1UyAeugHxZmVlQ5cEWo_eOPgXNQA1oD-rGooWOSwAqh8/edit?pref=2&pli=1#heading=h.e5s64nliyukh>
>> document I shared last week with a number of you. (Now we have a dev@ list
>> to share it more widely -- yay!)
>>
>> I just wanted to give you a little visibility into some of the work we’ve
>> been doing within Google over the last week:
>>
>> * Refactoring the DataflowJavaSDK: We’re hard at work separating out the
>> user-facing portions of the DataflowJavaSDK from the Google-specific
>> worker
>> harness. This will ensure that all runners (Cloud Dataflow, Spark, Flink)
>> are on equal footing with clear APIs to implement. Due to the
>> complications
>> that come with doing that while supporting our current users, we won’t be
>> able to push those changes to GitHub for a couple of weeks or so.
>>
>> * Repository structure: As we get ready to start moving different chunks
>> of
>> code into the new repo, we need to figure out the right way to structure
>> it. Here’s a proposal
>>
>> <https://docs.google.com/document/d/1mTeZED33Famq25XedbKeDlGIJRvtzCXjSfwH9NKQYUE/edit?usp=sharing>
>> -- please provide feedback!
>>
>> * Issue tracking: Thanks to JB for getting the Beam JIRA
>> <https://issues.apache.org/jira/browse/BEAM/> set up. We were thinking
>> that
>> it makes sense to put in components that match the repository structure
>> (see above). And then we’ll go ahead and start transitioning our internal
>> Google bug tracking into JIRA.
>>
>> Frances
>>
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com

Re: status update

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Frances,

and thanks for the update.

The repository structure looks good to me.

Maybe we can add a section about the PR workflow (PR/review/push). WDYT ?

For the Jira, no problem. I will add some tasks as well related to the 
roadmap (especially the DSLs, new IO, and DataIntegration part).

Thanks !
Regards
JB

On 02/09/2016 04:46 PM, Frances Perry wrote:
> Hi Beamers!
>
> Here’s the Apache Beam: Technical Vision
> <https://docs.google.com/document/d/1UyAeugHxZmVlQ5cEWo_eOPgXNQA1oD-rGooWOSwAqh8/edit?pref=2&pli=1#heading=h.e5s64nliyukh>
> document I shared last week with a number of you. (Now we have a dev@ list
> to share it more widely -- yay!)
>
> I just wanted to give you a little visibility into some of the work we’ve
> been doing within Google over the last week:
>
> * Refactoring the DataflowJavaSDK: We’re hard at work separating out the
> user-facing portions of the DataflowJavaSDK from the Google-specific worker
> harness. This will ensure that all runners (Cloud Dataflow, Spark, Flink)
> are on equal footing with clear APIs to implement. Due to the complications
> that come with doing that while supporting our current users, we won’t be
> able to push those changes to GitHub for a couple of weeks or so.
>
> * Repository structure: As we get ready to start moving different chunks of
> code into the new repo, we need to figure out the right way to structure
> it. Here’s a proposal
> <https://docs.google.com/document/d/1mTeZED33Famq25XedbKeDlGIJRvtzCXjSfwH9NKQYUE/edit?usp=sharing>
> -- please provide feedback!
>
> * Issue tracking: Thanks to JB for getting the Beam JIRA
> <https://issues.apache.org/jira/browse/BEAM/> set up. We were thinking that
> it makes sense to put in components that match the repository structure
> (see above). And then we’ll go ahead and start transitioning our internal
> Google bug tracking into JIRA.
>
> Frances
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com