You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Robert Bradshaw <ro...@google.com> on 2018/10/10 08:12:42 UTC

Splitting the repo

Hi everyone,

While IMHO it's too early to even be able to split the repo, it's not to
early to talk about it, and I wanted to spin this off to keep the other
thread focused.

In particular, I am trying to figure out exactly what is hoped to be gained
by splitting things up. In my experience, a single project that spans
multiple repos has always come with excessive overhead and pain. Of note,
we recently merged the website and dataflow-worker into the main repo
*exactly* to avoid this pain (though the latter was particularly bad due to
one of the repos being private).

If need be, I don't see any reason we can't have a single repo with
directories

model/
website/
java/
go/
...

possibly even with their own build system (unified only through a top-level
"build everything" script that descends into each subdir and runs the
appropriate command). I'm not saying we should do this (there is value in
having a single consistent build system, etc.) but it's possible. We could
probably even make separate releases out of this single repo (if we wanted,
though given that our releases are time-based rather than feature-based, I
don't see much advantage here).

Also, there was the comment.

On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau <rm...@gmail.com>
wrote:
>
> Side note: beam portability would be saner if added on top of others than
the opposite which is done today.

I think you brought this up before, Romain. I'm still trying to wrap my
head around what you mean here. Could you elaborate what such a structure
would look like?

Re: Splitting the repo

Posted by Romain Manni-Bucau <rm...@gmail.com>.
This looks functionnal whereas the split is more about languages and making
the build smooth and efficient to work with to get back up to speed.
Runners can stay in java land/subproject while they are not in other
languages for instance so the api between core and runner can stay as it
for that topic.

Le mer. 10 oct. 2018 11:58, Robert Bradshaw <ro...@google.com> a écrit :

> On Wed, Oct 10, 2018 at 10:25 AM Romain Manni-Bucau <rm...@gmail.com>
> wrote:
>
>> On the split point: a mono-repo works for me as well. The main point is
>> "N separate builds".
>>
>> On the portable thing: currently runner integrates with portable api. It
>> impacts all runner. The needed code is the same everywhere since it is
>> mainly a DoFn at the end (a bit caricatural but that is the big picture) so
>> at the end the portable impl can be unique and built in top of any runner.
>> The gains are:
>>
>> 1. Dont pollute java users
>> 2. Single code maintenance
>> 3. Support to upgrade the runner without changing this layer (contract
>> based integration - vs coupled one - so smoother updates in all layers)
>> 4. Simpler code (at least in design)
>>
>> Hooe it is clearer
>>
>
> Right now the basic structure is
>
>   SDK
>   \
>     [PortabilityAPI]
>   /
>   Beam Runners Core Library
>   \
>     [BeamRunnersCoreAPI]
>   /
>   Beam RunnerX Adapter Code
>   \
>     [RunnerXAPI]
>   /
>   Java RunnerX
>
> Where the APIs in brackets are what are used for the various components to
> talk to each other, and the later two are in Java. It sounds like what
> you're advocating for is the (Java) Beam Runners Core Library (along with
> its API). Am I understanding correctly? Of course some things are easier to
> abstract away than others (e.g. how SDK processes, if not in process, are
> launched (including staging their dependencies) and monitored is squarely
> in the domain of the particular runner, though we can abstract as much
> common, helper code as possible to higher levels).
>
>
>
>> Le mer. 10 oct. 2018 11:18, Jean-Baptiste Onofré <jb...@nanthrax.net> a
>> écrit :
>>
>>> Hi,
>>>
>>> +1, even I think we could split the core even deeper.
>>>
>>> I discussed with Luke and Reuven to introduce core-sql, core-schema,
>>> core-sdf, ...
>>>
>>> It's not a huge effort, and would allow us to move forward on Beam "more
>>> API oriented" approach.
>>>
>>> Regards
>>> JB
>>>
>>> On 10/10/2018 10:12, Robert Bradshaw wrote:
>>> > Hi everyone,
>>> >
>>> > While IMHO it's too early to even be able to split the repo, it's not
>>> to
>>> > early to talk about it, and I wanted to spin this off to keep the other
>>> > thread focused.
>>> >
>>> > In particular, I am trying to figure out exactly what is hoped to be
>>> > gained by splitting things up. In my experience, a single project that
>>> > spans multiple repos has always come with excessive overhead and pain.
>>> > Of note, we recently merged the website and dataflow-worker into the
>>> > main repo *exactly* to avoid this pain (though the latter was
>>> > particularly bad due to one of the repos being private).
>>> >
>>> > If need be, I don't see any reason we can't have a single repo with
>>> > directories
>>> >
>>> > model/
>>> > website/
>>> > java/
>>> > go/
>>> > ...
>>> >
>>> > possibly even with their own build system (unified only through a
>>> > top-level "build everything" script that descends into each subdir and
>>> > runs the appropriate command). I'm not saying we should do this (there
>>> > is value in having a single consistent build system, etc.) but it's
>>> > possible. We could probably even make separate releases out of this
>>> > single repo (if we wanted, though given that our releases are
>>> time-based
>>> > rather than feature-based, I don't see much advantage here).
>>> >
>>> > Also, there was the comment.
>>> >
>>> > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
>>> > <rmannibucau@gmail.com <ma...@gmail.com>> wrote:
>>> >>
>>> >> Side note: beam portability would be saner if added on top of others
>>> > than the opposite which is done today.
>>> >
>>> > I think you brought this up before, Romain. I'm still trying to wrap my
>>> > head around what you mean here. Could you elaborate what such a
>>> > structure would look like?
>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbonofre@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>

Re: Splitting the repo

Posted by Robert Bradshaw <ro...@google.com>.
On Wed, Oct 10, 2018 at 10:25 AM Romain Manni-Bucau <rm...@gmail.com>
wrote:

> On the split point: a mono-repo works for me as well. The main point is "N
> separate builds".
>
> On the portable thing: currently runner integrates with portable api. It
> impacts all runner. The needed code is the same everywhere since it is
> mainly a DoFn at the end (a bit caricatural but that is the big picture) so
> at the end the portable impl can be unique and built in top of any runner.
> The gains are:
>
> 1. Dont pollute java users
> 2. Single code maintenance
> 3. Support to upgrade the runner without changing this layer (contract
> based integration - vs coupled one - so smoother updates in all layers)
> 4. Simpler code (at least in design)
>
> Hooe it is clearer
>

Right now the basic structure is

  SDK
  \
    [PortabilityAPI]
  /
  Beam Runners Core Library
  \
    [BeamRunnersCoreAPI]
  /
  Beam RunnerX Adapter Code
  \
    [RunnerXAPI]
  /
  Java RunnerX

Where the APIs in brackets are what are used for the various components to
talk to each other, and the later two are in Java. It sounds like what
you're advocating for is the (Java) Beam Runners Core Library (along with
its API). Am I understanding correctly? Of course some things are easier to
abstract away than others (e.g. how SDK processes, if not in process, are
launched (including staging their dependencies) and monitored is squarely
in the domain of the particular runner, though we can abstract as much
common, helper code as possible to higher levels).



> Le mer. 10 oct. 2018 11:18, Jean-Baptiste Onofré <jb...@nanthrax.net> a
> écrit :
>
>> Hi,
>>
>> +1, even I think we could split the core even deeper.
>>
>> I discussed with Luke and Reuven to introduce core-sql, core-schema,
>> core-sdf, ...
>>
>> It's not a huge effort, and would allow us to move forward on Beam "more
>> API oriented" approach.
>>
>> Regards
>> JB
>>
>> On 10/10/2018 10:12, Robert Bradshaw wrote:
>> > Hi everyone,
>> >
>> > While IMHO it's too early to even be able to split the repo, it's not to
>> > early to talk about it, and I wanted to spin this off to keep the other
>> > thread focused.
>> >
>> > In particular, I am trying to figure out exactly what is hoped to be
>> > gained by splitting things up. In my experience, a single project that
>> > spans multiple repos has always come with excessive overhead and pain.
>> > Of note, we recently merged the website and dataflow-worker into the
>> > main repo *exactly* to avoid this pain (though the latter was
>> > particularly bad due to one of the repos being private).
>> >
>> > If need be, I don't see any reason we can't have a single repo with
>> > directories
>> >
>> > model/
>> > website/
>> > java/
>> > go/
>> > ...
>> >
>> > possibly even with their own build system (unified only through a
>> > top-level "build everything" script that descends into each subdir and
>> > runs the appropriate command). I'm not saying we should do this (there
>> > is value in having a single consistent build system, etc.) but it's
>> > possible. We could probably even make separate releases out of this
>> > single repo (if we wanted, though given that our releases are time-based
>> > rather than feature-based, I don't see much advantage here).
>> >
>> > Also, there was the comment.
>> >
>> > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
>> > <rmannibucau@gmail.com <ma...@gmail.com>> wrote:
>> >>
>> >> Side note: beam portability would be saner if added on top of others
>> > than the opposite which is done today.
>> >
>> > I think you brought this up before, Romain. I'm still trying to wrap my
>> > head around what you mean here. Could you elaborate what such a
>> > structure would look like?
>>
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>

Re: Splitting the repo

Posted by Romain Manni-Bucau <rm...@gmail.com>.
On the split point: a mono-repo works for me as well. The main point is "N
separate builds".

On the portable thing: currently runner integrates with portable api. It
impacts all runner. The needed code is the same everywhere since it is
mainly a DoFn at the end (a bit caricatural but that is the big picture) so
at the end the portable impl can be unique and built in top of any runner.
The gains are:

1. Dont pollute java users
2. Single code maintenance
3. Support to upgrade the runner without changing this layer (contract
based integration - vs coupled one - so smoother updates in all layers)
4. Simpler code (at least in design)

Hooe it is clearer

Le mer. 10 oct. 2018 11:18, Jean-Baptiste Onofré <jb...@nanthrax.net> a écrit :

> Hi,
>
> +1, even I think we could split the core even deeper.
>
> I discussed with Luke and Reuven to introduce core-sql, core-schema,
> core-sdf, ...
>
> It's not a huge effort, and would allow us to move forward on Beam "more
> API oriented" approach.
>
> Regards
> JB
>
> On 10/10/2018 10:12, Robert Bradshaw wrote:
> > Hi everyone,
> >
> > While IMHO it's too early to even be able to split the repo, it's not to
> > early to talk about it, and I wanted to spin this off to keep the other
> > thread focused.
> >
> > In particular, I am trying to figure out exactly what is hoped to be
> > gained by splitting things up. In my experience, a single project that
> > spans multiple repos has always come with excessive overhead and pain.
> > Of note, we recently merged the website and dataflow-worker into the
> > main repo *exactly* to avoid this pain (though the latter was
> > particularly bad due to one of the repos being private).
> >
> > If need be, I don't see any reason we can't have a single repo with
> > directories
> >
> > model/
> > website/
> > java/
> > go/
> > ...
> >
> > possibly even with their own build system (unified only through a
> > top-level "build everything" script that descends into each subdir and
> > runs the appropriate command). I'm not saying we should do this (there
> > is value in having a single consistent build system, etc.) but it's
> > possible. We could probably even make separate releases out of this
> > single repo (if we wanted, though given that our releases are time-based
> > rather than feature-based, I don't see much advantage here).
> >
> > Also, there was the comment.
> >
> > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
> > <rmannibucau@gmail.com <ma...@gmail.com>> wrote:
> >>
> >> Side note: beam portability would be saner if added on top of others
> > than the opposite which is done today.
> >
> > I think you brought this up before, Romain. I'm still trying to wrap my
> > head around what you mean here. Could you elaborate what such a
> > structure would look like?
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Splitting the repo

Posted by Robert Bradshaw <ro...@google.com>.
On Wed, Oct 10, 2018 at 9:21 PM Kenneth Knowles <ke...@apache.org> wrote:
>
> I think Robert's initial question needs to be focused on a particular split.

Yes, thank for bringing this back to the original question.

> I agree that a "single project spanning multiple repos" does not make sense. But separate projects in separate repos is pretty widely used :-). The point of separate repos IMO would be to empower (and force) them to act as separate projects.
>
> Every monorepo I have worked in has struggled with modularity problems. But conversely, a project with poor modularity can thrive in a monorepo because it is feasible to make changes across all the bits that are tightly coupled. Because it is a subtext whenever a Google employee talks about monorepos, I want to call out that Google's uniquely massive and interesting monorepo requires a tremendous amount of bespoke infrastructure to manage coupling, testing, ownership, etc*. It is not analogous to a large repo on GitHub.
>
> So... which pieces are "not separate enough" and why and how do we want to make them separate?
>
> I can think of some candidates that could benefit from some kind of "separateness":
>
>  - IOs or collections of IOs: separate release cadence, only build on stable SDK releases (potential for diamond dep problems)
>  - Portability protos: forces them to be highly stable and forces runners to adapt to major iterations
>  - Language SDKs: easier to build a community of devs with a clearly familiar project structure and toolchain
>
> Maybe the kinds of separation that folks want does not have to be a separate repo, as mentioned. But it is still important that most infrastructure and UI is geared towards a certain scale of project (not just repo): issue tracking, pull request management, mailing lists, ownership, selective test execution, triaging test failures, etc.

+1. I don't think the subcomponents of beam are yet independent or
large enough to merit being separate projects. (One criteria for being
a separate project is having its own website, otherwise where should
the website sources live?) Another criteria is the point at which
there is more gain then pain by allowing users to mix and match
different versions of different projects (and the forcing function of
being highly stable becomes more of an asset rather than a hindrance).

We may of course get there in time, but I don't think we're there yet
(certainly not until potability settles down at least), and consensus
seems to be that better divisions in the existing repo would resolve
most peoples concerns at the moment.

> At this point, I see strong arguments in both directions and think that a specific proposal of a specific split at the right time deserves an individualized discussion.
>
> Kenn
>
> *Other issues include governance and effectiveness for shipping user-friendly libraries
>
>
>
>
> On Wed, Oct 10, 2018 at 11:12 AM Ankur Goenka <go...@google.com> wrote:
>>
>> Hi,
>>
>> I think the subtext here is that development is hard in general. I agree to it. And a major cause of it is diversity of languages, complexity of the project and legacy code.
>> To alleviate language related issues, we are trying to have modular code which we already have to a certain extent.
>> On the other hand tooling is still evolving and needs improvement. I also feel that tooling is a moving target and its good to keep on reevaluating it.
>> Tooling is a problem for everyone (the whole community) and we are actively trying to solve it. Gradle is a big step towards it.
>> I personally contribute to multiple languages. Many of the PR have changes spanning across languages and have to be merged as a whole. I personally feel that having a unified build system makes it easier to do the checks and make sure things work.
>> Even after gradle, I am still able to setup intellij for Java, Pycharm for Python and GoLand for Go as I would have done earlier (before gradle). I am also able to run "python setup.py sdist" as I was able to do before gradle.
>> Gradle is also acting as the top level task manager and most of the python tasks are just plain shell commands stitched together.
>> The only real problem that I face in my setup is the vendored java jars which only impact java development.
>> Probably documenting separate environment specific setup for each language is sufficient to address the issue.
>>
>> I also agree with Max that splitting the repo will cause more pain than gain.
>>
>> ~Ankur
>>
>>
>>
>> On Wed, Oct 10, 2018 at 7:56 AM Romain Manni-Bucau <rm...@gmail.com> wrote:
>>>
>>>
>>>
>>>
>>> Le mer. 10 oct. 2018 à 14:59, Maximilian Michels <mx...@apache.org> a écrit :
>>>>
>>>> Hi,
>>>>
>>>> I agree that splitting up Beam into separate repositories would cause
>>>> more pain than gain.
>>>>
>>>> To a large degree we already have independent modules, e.g. runners/* or
>>>> sdks/*. Although this is not the case for the core. It would be
>>>> desirable to break it up further.
>>>
>>>
>>> Think this part is ok for everyone.
>>>
>>>>
>>>>
>>>>  > possibly even with their own build system (unified only through a
>>>>  > top-level "build everything" script that descends into each subdir and
>>>>  > runs the appropriate command).
>>>>
>>>> This is almost what we have. Yes, there are some dependencies on the
>>>> Beam Gradle Plugin, but even if we had completely independent build
>>>> directories, you'd still want to have a shared config/tasks across the
>>>> projects (which might bring you back to a setup similar to what we have).
>>>>
>>>> One of the pain points seems to be the portability which "polluted" some
>>>> parts of the project (e.g. legacy Runners). As mentioned in this thread
>>>> that could have been solved with an abstraction. But the lack of
>>>> abstraction also forced us to adopt the portable pipeline code quicker.
>>>
>>>
>>> Not at all. Assume we have a full build which is doing portability then 3 concurrent builds (go, python, java)
>>> then we have "current step" in the CI but the dev are never affected by that and the build does not mess up their machines as well.
>>>
>>> Today the main blocker is that default "profile" (script) is not matching dev persona and therefore there is no real hope to have external contributions
>>> outside google related guys as mentionned by previous ficgures which is sad for a project promishing unification and work between communities IMHO.
>>>
>>>>
>>>>
>>>> -Max
>>>>
>>>> On 10.10.18 10:51, Romain Manni-Bucau wrote:
>>>> > Yep for the split
>>>> >
>>>> > For the clean point it is quite linked to the build tools and fake env
>>>> > for not native modules for the build tool (go for gradle which is java
>>>> > first for instance). This is why having a real build which is natural
>>>> > per language would be beneficial IMO.
>>>> >
>>>> > Le mer. 10 oct. 2018 11:38, Jean-Baptiste Onofré <jb@nanthrax.net
>>>> > <ma...@nanthrax.net>> a écrit :
>>>> >
>>>> >     Correct, it's more "module splitting" than repositories indeed.
>>>> >
>>>> >     Regards
>>>> >     JB
>>>> >
>>>> >     On 10/10/2018 10:35, Robert Bradshaw wrote:
>>>> >      > Gotcha. So this is more about dividing the code (particularly
>>>> >     core) into
>>>> >      > finer modules, rather than splitting the modules into separate
>>>> >      > repositories, right?
>>>> >      >
>>>> >      > On Wed, Oct 10, 2018 at 10:29 AM Jean-Baptiste Onofré
>>>> >     <jb@nanthrax.net <ma...@nanthrax.net>
>>>> >      > <mailto:jb@nanthrax.net <ma...@nanthrax.net>>> wrote:
>>>> >      >
>>>> >      >     The purpose is that we have a monolithic core today mostly
>>>> >     providing
>>>> >      >     abstract classes.
>>>> >      >
>>>> >      >     The idea is to have something more API oriented with
>>>> >     interface/SPI.
>>>> >      >
>>>> >      >     Our users would then be able to pick the part of the core
>>>> >     they want,
>>>> >      >     resulting with lighter artifacts, and for us, it gives a more
>>>> >     flexible
>>>> >      >     approach.
>>>> >      >
>>>> >      >     Regards
>>>> >      >     JB
>>>> >      >
>>>> >      >     On 10/10/2018 10:26, Robert Bradshaw wrote:
>>>> >      >     > My question was not whether we should split the repo, but why?
>>>> >      >     (Dividing
>>>> >      >     > things into more (or fewer) modules withing a single repo is a
>>>> >      >     separate
>>>> >      >     > question.) Maybe I'm just not following what you mean by
>>>> >     "more API
>>>> >      >     > oriented." It would force stabler APIs.
>>>> >      >     >
>>>> >      >     > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré
>>>> >      >     <jb@nanthrax.net <ma...@nanthrax.net>
>>>> >     <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
>>>> >      >     > <mailto:jb@nanthrax.net <ma...@nanthrax.net>
>>>> >     <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>> wrote:
>>>> >      >     >
>>>> >      >     >     Hi,
>>>> >      >     >
>>>> >      >     >     +1, even I think we could split the core even deeper.
>>>> >      >     >
>>>> >      >     >     I discussed with Luke and Reuven to introduce core-sql,
>>>> >      >     core-schema,
>>>> >      >     >     core-sdf, ...
>>>> >      >     >
>>>> >      >     >     It's not a huge effort, and would allow us to move
>>>> >     forward on
>>>> >      >     Beam "more
>>>> >      >     >     API oriented" approach.
>>>> >      >     >
>>>> >      >     >     Regards
>>>> >      >     >     JB
>>>> >      >     >
>>>> >      >     >     On 10/10/2018 10:12, Robert Bradshaw wrote:
>>>> >      >     >     > Hi everyone,
>>>> >      >     >     >
>>>> >      >     >     > While IMHO it's too early to even be able to split
>>>> >     the repo,
>>>> >      >     it's
>>>> >      >     >     not to
>>>> >      >     >     > early to talk about it, and I wanted to spin this off to
>>>> >      >     keep the
>>>> >      >     >     other
>>>> >      >     >     > thread focused.
>>>> >      >     >     >
>>>> >      >     >     > In particular, I am trying to figure out exactly what is
>>>> >      >     hoped to be
>>>> >      >     >     > gained by splitting things up. In my experience, a single
>>>> >      >     project that
>>>> >      >     >     > spans multiple repos has always come with excessive
>>>> >     overhead
>>>> >      >     and pain.
>>>> >      >     >     > Of note, we recently merged the website and
>>>> >     dataflow-worker
>>>> >      >     into the
>>>> >      >     >     > main repo *exactly* to avoid this pain (though the
>>>> >     latter was
>>>> >      >     >     > particularly bad due to one of the repos being private).
>>>> >      >     >     >
>>>> >      >     >     > If need be, I don't see any reason we can't have a single
>>>> >      >     repo with
>>>> >      >     >     > directories
>>>> >      >     >     >
>>>> >      >     >     > model/
>>>> >      >     >     > website/
>>>> >      >     >     > java/
>>>> >      >     >     > go/
>>>> >      >     >     > ...
>>>> >      >     >     >
>>>> >      >     >     > possibly even with their own build system (unified only
>>>> >      >     through a
>>>> >      >     >     > top-level "build everything" script that descends
>>>> >     into each
>>>> >      >     subdir and
>>>> >      >     >     > runs the appropriate command). I'm not saying we
>>>> >     should do
>>>> >      >     this (there
>>>> >      >     >     > is value in having a single consistent build system,
>>>> >     etc.)
>>>> >      >     but it's
>>>> >      >     >     > possible. We could probably even make separate
>>>> >     releases out
>>>> >      >     of this
>>>> >      >     >     > single repo (if we wanted, though given that our
>>>> >     releases are
>>>> >      >     >     time-based
>>>> >      >     >     > rather than feature-based, I don't see much advantage
>>>> >     here).
>>>> >      >     >     >
>>>> >      >     >     > Also, there was the comment.
>>>> >      >     >     >
>>>> >      >     >     > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
>>>> >      >     >     > <rmannibucau@gmail.com <ma...@gmail.com>
>>>> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>
>>>> >      >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>
>>>> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>>
>>>> >      >     >     <mailto:rmannibucau@gmail.com
>>>> >     <ma...@gmail.com> <mailto:rmannibucau@gmail.com
>>>> >     <ma...@gmail.com>>
>>>> >      >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>
>>>> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>>>> wrote:
>>>> >      >     >     >>
>>>> >      >     >     >> Side note: beam portability would be saner if added
>>>> >     on top
>>>> >      >     of others
>>>> >      >     >     > than the opposite which is done today.
>>>> >      >     >     >
>>>> >      >     >     > I think you brought this up before, Romain. I'm still
>>>> >     trying to
>>>> >      >     >     wrap my
>>>> >      >     >     > head around what you mean here. Could you elaborate
>>>> >     what such a
>>>> >      >     >     > structure would look like?
>>>> >      >     >
>>>> >      >     >     --
>>>> >      >     >     Jean-Baptiste Onofré
>>>> >      >     > jbonofre@apache.org <ma...@apache.org>
>>>> >     <mailto:jbonofre@apache.org <ma...@apache.org>>
>>>> >      >     <mailto:jbonofre@apache.org <ma...@apache.org>
>>>> >     <mailto:jbonofre@apache.org <ma...@apache.org>>>
>>>> >      >     > http://blog.nanthrax.net
>>>> >      >     >     Talend - http://www.talend.com
>>>> >      >     >
>>>> >      >
>>>> >      >     --
>>>> >      >     Jean-Baptiste Onofré
>>>> >      > jbonofre@apache.org <ma...@apache.org>
>>>> >     <mailto:jbonofre@apache.org <ma...@apache.org>>
>>>> >      > http://blog.nanthrax.net
>>>> >      >     Talend - http://www.talend.com
>>>> >      >
>>>> >
>>>> >     --
>>>> >     Jean-Baptiste Onofré
>>>> >     jbonofre@apache.org <ma...@apache.org>
>>>> >     http://blog.nanthrax.net
>>>> >     Talend - http://www.talend.com
>>>> >

Re: Splitting the repo

Posted by Kenneth Knowles <ke...@apache.org>.
I think Robert's initial question needs to be focused on a particular split.

I agree that a "single project spanning multiple repos" does not make
sense. But separate projects in separate repos is pretty widely used :-). The
point of separate repos IMO would be to empower (and force) them to act as
separate projects.

Every monorepo I have worked in has struggled with modularity problems. But
conversely, a project with poor modularity can thrive in a monorepo because
it is feasible to make changes across all the bits that are tightly
coupled. Because it is a subtext whenever a Google employee talks about
monorepos, I want to call out that Google's uniquely massive and
interesting monorepo requires a tremendous amount of bespoke infrastructure
to manage coupling, testing, ownership, etc*. It is not analogous to a
large repo on GitHub.

So... which pieces are "not separate enough" and why and how do we want to
make them separate?

I can think of some candidates that could benefit from some kind of
"separateness":

 - IOs or collections of IOs: separate release cadence, only build on
stable SDK releases (potential for diamond dep problems)
 - Portability protos: forces them to be highly stable and forces runners
to adapt to major iterations
 - Language SDKs: easier to build a community of devs with a clearly
familiar project structure and toolchain

Maybe the kinds of separation that folks want does not have to be a
separate repo, as mentioned. But it is still important that most
infrastructure and UI is geared towards a certain scale of project (not
just repo): issue tracking, pull request management, mailing lists,
ownership, selective test execution, triaging test failures, etc.

At this point, I see strong arguments in both directions and think that a
specific proposal of a specific split at the right time deserves an
individualized discussion.

Kenn

*Other issues include governance and effectiveness for shipping
user-friendly libraries




On Wed, Oct 10, 2018 at 11:12 AM Ankur Goenka <go...@google.com> wrote:

> Hi,
>
> I think the subtext here is that development is hard in general. I agree
> to it. And a major cause of it is diversity of languages, complexity of the
> project and legacy code.
> To alleviate language related issues, we are trying to have modular code
> which we already have to a certain extent.
> On the other hand tooling is still evolving and needs improvement. I also
> feel that tooling is a moving target and its good to keep on reevaluating
> it.
> Tooling is a problem for everyone (the whole community) and we are
> actively trying to solve it. Gradle is a big step towards it.
> I personally contribute to multiple languages. Many of the PR have changes
> spanning across languages and have to be merged as a whole. I personally
> feel that having a unified build system makes it easier to do the checks
> and make sure things work.
> Even after gradle, I am still able to setup intellij for Java, Pycharm for
> Python and GoLand for Go as I would have done earlier (before gradle). I am
> also able to run "python setup.py sdist" as I was able to do before gradle.
> Gradle is also acting as the top level task manager and most of the python
> tasks are just plain shell commands stitched together.
> The only real problem that I face in my setup is the vendored java jars
> which only impact java development.
> Probably documenting separate environment specific setup for each language
> is sufficient to address the issue.
>
> I also agree with Max that splitting the repo will cause more pain than
> gain.
>
> ~Ankur
>
>
>
> On Wed, Oct 10, 2018 at 7:56 AM Romain Manni-Bucau <rm...@gmail.com>
> wrote:
>
>>
>>
>>
>> Le mer. 10 oct. 2018 à 14:59, Maximilian Michels <mx...@apache.org> a
>> écrit :
>>
>>> Hi,
>>>
>>> I agree that splitting up Beam into separate repositories would cause
>>> more pain than gain.
>>>
>>> To a large degree we already have independent modules, e.g. runners/* or
>>> sdks/*. Although this is not the case for the core. It would be
>>> desirable to break it up further.
>>>
>>
>> Think this part is ok for everyone.
>>
>>
>>>
>>>  > possibly even with their own build system (unified only through a
>>>  > top-level "build everything" script that descends into each subdir and
>>>  > runs the appropriate command).
>>>
>>> This is almost what we have. Yes, there are some dependencies on the
>>> Beam Gradle Plugin, but even if we had completely independent build
>>> directories, you'd still want to have a shared config/tasks across the
>>> projects (which might bring you back to a setup similar to what we have).
>>>
>>> One of the pain points seems to be the portability which "polluted" some
>>> parts of the project (e.g. legacy Runners). As mentioned in this thread
>>> that could have been solved with an abstraction. But the lack of
>>> abstraction also forced us to adopt the portable pipeline code quicker.
>>>
>>
>> Not at all. Assume we have a full build which is doing portability then 3
>> concurrent builds (go, python, java)
>> then we have "current step" in the CI but the dev are never affected by
>> that and the build does not mess up their machines as well.
>>
>> Today the main blocker is that default "profile" (script) is not matching
>> dev persona and therefore there is no real hope to have external
>> contributions
>> outside google related guys as mentionned by previous ficgures which is
>> sad for a project promishing unification and work between communities IMHO.
>>
>>
>>>
>>> -Max
>>>
>>> On 10.10.18 10:51, Romain Manni-Bucau wrote:
>>> > Yep for the split
>>> >
>>> > For the clean point it is quite linked to the build tools and fake env
>>> > for not native modules for the build tool (go for gradle which is java
>>> > first for instance). This is why having a real build which is natural
>>> > per language would be beneficial IMO.
>>> >
>>> > Le mer. 10 oct. 2018 11:38, Jean-Baptiste Onofré <jb@nanthrax.net
>>> > <ma...@nanthrax.net>> a écrit :
>>> >
>>> >     Correct, it's more "module splitting" than repositories indeed.
>>> >
>>> >     Regards
>>> >     JB
>>> >
>>> >     On 10/10/2018 10:35, Robert Bradshaw wrote:
>>> >      > Gotcha. So this is more about dividing the code (particularly
>>> >     core) into
>>> >      > finer modules, rather than splitting the modules into separate
>>> >      > repositories, right?
>>> >      >
>>> >      > On Wed, Oct 10, 2018 at 10:29 AM Jean-Baptiste Onofré
>>> >     <jb@nanthrax.net <ma...@nanthrax.net>
>>> >      > <mailto:jb@nanthrax.net <ma...@nanthrax.net>>> wrote:
>>> >      >
>>> >      >     The purpose is that we have a monolithic core today mostly
>>> >     providing
>>> >      >     abstract classes.
>>> >      >
>>> >      >     The idea is to have something more API oriented with
>>> >     interface/SPI.
>>> >      >
>>> >      >     Our users would then be able to pick the part of the core
>>> >     they want,
>>> >      >     resulting with lighter artifacts, and for us, it gives a
>>> more
>>> >     flexible
>>> >      >     approach.
>>> >      >
>>> >      >     Regards
>>> >      >     JB
>>> >      >
>>> >      >     On 10/10/2018 10:26, Robert Bradshaw wrote:
>>> >      >     > My question was not whether we should split the repo, but
>>> why?
>>> >      >     (Dividing
>>> >      >     > things into more (or fewer) modules withing a single repo
>>> is a
>>> >      >     separate
>>> >      >     > question.) Maybe I'm just not following what you mean by
>>> >     "more API
>>> >      >     > oriented." It would force stabler APIs.
>>> >      >     >
>>> >      >     > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré
>>> >      >     <jb@nanthrax.net <ma...@nanthrax.net>
>>> >     <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
>>> >      >     > <mailto:jb@nanthrax.net <ma...@nanthrax.net>
>>> >     <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>> wrote:
>>> >      >     >
>>> >      >     >     Hi,
>>> >      >     >
>>> >      >     >     +1, even I think we could split the core even deeper.
>>> >      >     >
>>> >      >     >     I discussed with Luke and Reuven to introduce
>>> core-sql,
>>> >      >     core-schema,
>>> >      >     >     core-sdf, ...
>>> >      >     >
>>> >      >     >     It's not a huge effort, and would allow us to move
>>> >     forward on
>>> >      >     Beam "more
>>> >      >     >     API oriented" approach.
>>> >      >     >
>>> >      >     >     Regards
>>> >      >     >     JB
>>> >      >     >
>>> >      >     >     On 10/10/2018 10:12, Robert Bradshaw wrote:
>>> >      >     >     > Hi everyone,
>>> >      >     >     >
>>> >      >     >     > While IMHO it's too early to even be able to split
>>> >     the repo,
>>> >      >     it's
>>> >      >     >     not to
>>> >      >     >     > early to talk about it, and I wanted to spin this
>>> off to
>>> >      >     keep the
>>> >      >     >     other
>>> >      >     >     > thread focused.
>>> >      >     >     >
>>> >      >     >     > In particular, I am trying to figure out exactly
>>> what is
>>> >      >     hoped to be
>>> >      >     >     > gained by splitting things up. In my experience, a
>>> single
>>> >      >     project that
>>> >      >     >     > spans multiple repos has always come with excessive
>>> >     overhead
>>> >      >     and pain.
>>> >      >     >     > Of note, we recently merged the website and
>>> >     dataflow-worker
>>> >      >     into the
>>> >      >     >     > main repo *exactly* to avoid this pain (though the
>>> >     latter was
>>> >      >     >     > particularly bad due to one of the repos being
>>> private).
>>> >      >     >     >
>>> >      >     >     > If need be, I don't see any reason we can't have a
>>> single
>>> >      >     repo with
>>> >      >     >     > directories
>>> >      >     >     >
>>> >      >     >     > model/
>>> >      >     >     > website/
>>> >      >     >     > java/
>>> >      >     >     > go/
>>> >      >     >     > ...
>>> >      >     >     >
>>> >      >     >     > possibly even with their own build system (unified
>>> only
>>> >      >     through a
>>> >      >     >     > top-level "build everything" script that descends
>>> >     into each
>>> >      >     subdir and
>>> >      >     >     > runs the appropriate command). I'm not saying we
>>> >     should do
>>> >      >     this (there
>>> >      >     >     > is value in having a single consistent build system,
>>> >     etc.)
>>> >      >     but it's
>>> >      >     >     > possible. We could probably even make separate
>>> >     releases out
>>> >      >     of this
>>> >      >     >     > single repo (if we wanted, though given that our
>>> >     releases are
>>> >      >     >     time-based
>>> >      >     >     > rather than feature-based, I don't see much
>>> advantage
>>> >     here).
>>> >      >     >     >
>>> >      >     >     > Also, there was the comment.
>>> >      >     >     >
>>> >      >     >     > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
>>> >      >     >     > <rmannibucau@gmail.com <mailto:
>>> rmannibucau@gmail.com>
>>> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>
>>> >      >     <mailto:rmannibucau@gmail.com <mailto:rmannibucau@gmail.com
>>> >
>>> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>>
>>> >      >     >     <mailto:rmannibucau@gmail.com
>>> >     <ma...@gmail.com> <mailto:rmannibucau@gmail.com
>>> >     <ma...@gmail.com>>
>>> >      >     <mailto:rmannibucau@gmail.com <mailto:rmannibucau@gmail.com
>>> >
>>> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>>>>
>>> wrote:
>>> >      >     >     >>
>>> >      >     >     >> Side note: beam portability would be saner if added
>>> >     on top
>>> >      >     of others
>>> >      >     >     > than the opposite which is done today.
>>> >      >     >     >
>>> >      >     >     > I think you brought this up before, Romain. I'm
>>> still
>>> >     trying to
>>> >      >     >     wrap my
>>> >      >     >     > head around what you mean here. Could you elaborate
>>> >     what such a
>>> >      >     >     > structure would look like?
>>> >      >     >
>>> >      >     >     --
>>> >      >     >     Jean-Baptiste Onofré
>>> >      >     > jbonofre@apache.org <ma...@apache.org>
>>> >     <mailto:jbonofre@apache.org <ma...@apache.org>>
>>> >      >     <mailto:jbonofre@apache.org <ma...@apache.org>
>>> >     <mailto:jbonofre@apache.org <ma...@apache.org>>>
>>> >      >     > http://blog.nanthrax.net
>>> >      >     >     Talend - http://www.talend.com
>>> >      >     >
>>> >      >
>>> >      >     --
>>> >      >     Jean-Baptiste Onofré
>>> >      > jbonofre@apache.org <ma...@apache.org>
>>> >     <mailto:jbonofre@apache.org <ma...@apache.org>>
>>> >      > http://blog.nanthrax.net
>>> >      >     Talend - http://www.talend.com
>>> >      >
>>> >
>>> >     --
>>> >     Jean-Baptiste Onofré
>>> >     jbonofre@apache.org <ma...@apache.org>
>>> >     http://blog.nanthrax.net
>>> >     Talend - http://www.talend.com
>>> >
>>>
>>

Re: Splitting the repo

Posted by Ankur Goenka <go...@google.com>.
Hi,

I think the subtext here is that development is hard in general. I agree to
it. And a major cause of it is diversity of languages, complexity of the
project and legacy code.
To alleviate language related issues, we are trying to have modular code
which we already have to a certain extent.
On the other hand tooling is still evolving and needs improvement. I also
feel that tooling is a moving target and its good to keep on reevaluating
it.
Tooling is a problem for everyone (the whole community) and we are actively
trying to solve it. Gradle is a big step towards it.
I personally contribute to multiple languages. Many of the PR have changes
spanning across languages and have to be merged as a whole. I personally
feel that having a unified build system makes it easier to do the checks
and make sure things work.
Even after gradle, I am still able to setup intellij for Java, Pycharm for
Python and GoLand for Go as I would have done earlier (before gradle). I am
also able to run "python setup.py sdist" as I was able to do before gradle.
Gradle is also acting as the top level task manager and most of the python
tasks are just plain shell commands stitched together.
The only real problem that I face in my setup is the vendored java jars
which only impact java development.
Probably documenting separate environment specific setup for each language
is sufficient to address the issue.

I also agree with Max that splitting the repo will cause more pain than
gain.

~Ankur



On Wed, Oct 10, 2018 at 7:56 AM Romain Manni-Bucau <rm...@gmail.com>
wrote:

>
>
>
> Le mer. 10 oct. 2018 à 14:59, Maximilian Michels <mx...@apache.org> a
> écrit :
>
>> Hi,
>>
>> I agree that splitting up Beam into separate repositories would cause
>> more pain than gain.
>>
>> To a large degree we already have independent modules, e.g. runners/* or
>> sdks/*. Although this is not the case for the core. It would be
>> desirable to break it up further.
>>
>
> Think this part is ok for everyone.
>
>
>>
>>  > possibly even with their own build system (unified only through a
>>  > top-level "build everything" script that descends into each subdir and
>>  > runs the appropriate command).
>>
>> This is almost what we have. Yes, there are some dependencies on the
>> Beam Gradle Plugin, but even if we had completely independent build
>> directories, you'd still want to have a shared config/tasks across the
>> projects (which might bring you back to a setup similar to what we have).
>>
>> One of the pain points seems to be the portability which "polluted" some
>> parts of the project (e.g. legacy Runners). As mentioned in this thread
>> that could have been solved with an abstraction. But the lack of
>> abstraction also forced us to adopt the portable pipeline code quicker.
>>
>
> Not at all. Assume we have a full build which is doing portability then 3
> concurrent builds (go, python, java)
> then we have "current step" in the CI but the dev are never affected by
> that and the build does not mess up their machines as well.
>
> Today the main blocker is that default "profile" (script) is not matching
> dev persona and therefore there is no real hope to have external
> contributions
> outside google related guys as mentionned by previous ficgures which is
> sad for a project promishing unification and work between communities IMHO.
>
>
>>
>> -Max
>>
>> On 10.10.18 10:51, Romain Manni-Bucau wrote:
>> > Yep for the split
>> >
>> > For the clean point it is quite linked to the build tools and fake env
>> > for not native modules for the build tool (go for gradle which is java
>> > first for instance). This is why having a real build which is natural
>> > per language would be beneficial IMO.
>> >
>> > Le mer. 10 oct. 2018 11:38, Jean-Baptiste Onofré <jb@nanthrax.net
>> > <ma...@nanthrax.net>> a écrit :
>> >
>> >     Correct, it's more "module splitting" than repositories indeed.
>> >
>> >     Regards
>> >     JB
>> >
>> >     On 10/10/2018 10:35, Robert Bradshaw wrote:
>> >      > Gotcha. So this is more about dividing the code (particularly
>> >     core) into
>> >      > finer modules, rather than splitting the modules into separate
>> >      > repositories, right?
>> >      >
>> >      > On Wed, Oct 10, 2018 at 10:29 AM Jean-Baptiste Onofré
>> >     <jb@nanthrax.net <ma...@nanthrax.net>
>> >      > <mailto:jb@nanthrax.net <ma...@nanthrax.net>>> wrote:
>> >      >
>> >      >     The purpose is that we have a monolithic core today mostly
>> >     providing
>> >      >     abstract classes.
>> >      >
>> >      >     The idea is to have something more API oriented with
>> >     interface/SPI.
>> >      >
>> >      >     Our users would then be able to pick the part of the core
>> >     they want,
>> >      >     resulting with lighter artifacts, and for us, it gives a more
>> >     flexible
>> >      >     approach.
>> >      >
>> >      >     Regards
>> >      >     JB
>> >      >
>> >      >     On 10/10/2018 10:26, Robert Bradshaw wrote:
>> >      >     > My question was not whether we should split the repo, but
>> why?
>> >      >     (Dividing
>> >      >     > things into more (or fewer) modules withing a single repo
>> is a
>> >      >     separate
>> >      >     > question.) Maybe I'm just not following what you mean by
>> >     "more API
>> >      >     > oriented." It would force stabler APIs.
>> >      >     >
>> >      >     > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré
>> >      >     <jb@nanthrax.net <ma...@nanthrax.net>
>> >     <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
>> >      >     > <mailto:jb@nanthrax.net <ma...@nanthrax.net>
>> >     <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>> wrote:
>> >      >     >
>> >      >     >     Hi,
>> >      >     >
>> >      >     >     +1, even I think we could split the core even deeper.
>> >      >     >
>> >      >     >     I discussed with Luke and Reuven to introduce core-sql,
>> >      >     core-schema,
>> >      >     >     core-sdf, ...
>> >      >     >
>> >      >     >     It's not a huge effort, and would allow us to move
>> >     forward on
>> >      >     Beam "more
>> >      >     >     API oriented" approach.
>> >      >     >
>> >      >     >     Regards
>> >      >     >     JB
>> >      >     >
>> >      >     >     On 10/10/2018 10:12, Robert Bradshaw wrote:
>> >      >     >     > Hi everyone,
>> >      >     >     >
>> >      >     >     > While IMHO it's too early to even be able to split
>> >     the repo,
>> >      >     it's
>> >      >     >     not to
>> >      >     >     > early to talk about it, and I wanted to spin this
>> off to
>> >      >     keep the
>> >      >     >     other
>> >      >     >     > thread focused.
>> >      >     >     >
>> >      >     >     > In particular, I am trying to figure out exactly
>> what is
>> >      >     hoped to be
>> >      >     >     > gained by splitting things up. In my experience, a
>> single
>> >      >     project that
>> >      >     >     > spans multiple repos has always come with excessive
>> >     overhead
>> >      >     and pain.
>> >      >     >     > Of note, we recently merged the website and
>> >     dataflow-worker
>> >      >     into the
>> >      >     >     > main repo *exactly* to avoid this pain (though the
>> >     latter was
>> >      >     >     > particularly bad due to one of the repos being
>> private).
>> >      >     >     >
>> >      >     >     > If need be, I don't see any reason we can't have a
>> single
>> >      >     repo with
>> >      >     >     > directories
>> >      >     >     >
>> >      >     >     > model/
>> >      >     >     > website/
>> >      >     >     > java/
>> >      >     >     > go/
>> >      >     >     > ...
>> >      >     >     >
>> >      >     >     > possibly even with their own build system (unified
>> only
>> >      >     through a
>> >      >     >     > top-level "build everything" script that descends
>> >     into each
>> >      >     subdir and
>> >      >     >     > runs the appropriate command). I'm not saying we
>> >     should do
>> >      >     this (there
>> >      >     >     > is value in having a single consistent build system,
>> >     etc.)
>> >      >     but it's
>> >      >     >     > possible. We could probably even make separate
>> >     releases out
>> >      >     of this
>> >      >     >     > single repo (if we wanted, though given that our
>> >     releases are
>> >      >     >     time-based
>> >      >     >     > rather than feature-based, I don't see much advantage
>> >     here).
>> >      >     >     >
>> >      >     >     > Also, there was the comment.
>> >      >     >     >
>> >      >     >     > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
>> >      >     >     > <rmannibucau@gmail.com <mailto:rmannibucau@gmail.com
>> >
>> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>
>> >      >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>
>> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>>
>> >      >     >     <mailto:rmannibucau@gmail.com
>> >     <ma...@gmail.com> <mailto:rmannibucau@gmail.com
>> >     <ma...@gmail.com>>
>> >      >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>
>> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>>>>
>> wrote:
>> >      >     >     >>
>> >      >     >     >> Side note: beam portability would be saner if added
>> >     on top
>> >      >     of others
>> >      >     >     > than the opposite which is done today.
>> >      >     >     >
>> >      >     >     > I think you brought this up before, Romain. I'm still
>> >     trying to
>> >      >     >     wrap my
>> >      >     >     > head around what you mean here. Could you elaborate
>> >     what such a
>> >      >     >     > structure would look like?
>> >      >     >
>> >      >     >     --
>> >      >     >     Jean-Baptiste Onofré
>> >      >     > jbonofre@apache.org <ma...@apache.org>
>> >     <mailto:jbonofre@apache.org <ma...@apache.org>>
>> >      >     <mailto:jbonofre@apache.org <ma...@apache.org>
>> >     <mailto:jbonofre@apache.org <ma...@apache.org>>>
>> >      >     > http://blog.nanthrax.net
>> >      >     >     Talend - http://www.talend.com
>> >      >     >
>> >      >
>> >      >     --
>> >      >     Jean-Baptiste Onofré
>> >      > jbonofre@apache.org <ma...@apache.org>
>> >     <mailto:jbonofre@apache.org <ma...@apache.org>>
>> >      > http://blog.nanthrax.net
>> >      >     Talend - http://www.talend.com
>> >      >
>> >
>> >     --
>> >     Jean-Baptiste Onofré
>> >     jbonofre@apache.org <ma...@apache.org>
>> >     http://blog.nanthrax.net
>> >     Talend - http://www.talend.com
>> >
>>
>

Re: Splitting the repo

Posted by Romain Manni-Bucau <rm...@gmail.com>.
Le mer. 10 oct. 2018 21:31, Robert Bradshaw <ro...@google.com> a écrit :

>
>
> On Wed, Oct 10, 2018, 4:56 PM Romain Manni-Bucau <rm...@gmail.com>
> wrote:
>
>>
>>
>>
>> Le mer. 10 oct. 2018 à 14:59, Maximilian Michels <mx...@apache.org> a
>> écrit :
>>
>>> Hi,
>>>
>>> I agree that splitting up Beam into separate repositories would cause
>>> more pain than gain.
>>>
>>> To a large degree we already have independent modules, e.g. runners/* or
>>> sdks/*. Although this is not the case for the core. It would be
>>> desirable to break it up further.
>>>
>>
>> Think this part is ok for everyone.
>>
>>
>>>
>>>  > possibly even with their own build system (unified only through a
>>>  > top-level "build everything" script that descends into each subdir and
>>>  > runs the appropriate command).
>>>
>>> This is almost what we have. Yes, there are some dependencies on the
>>> Beam Gradle Plugin, but even if we had completely independent build
>>> directories, you'd still want to have a shared config/tasks across the
>>> projects (which might bring you back to a setup similar to what we have).
>>>
>>> One of the pain points seems to be the portability which "polluted" some
>>> parts of the project (e.g. legacy Runners). As mentioned in this thread
>>> that could have been solved with an abstraction. But the lack of
>>> abstraction also forced us to adopt the portable pipeline code quicker.
>>>
>>
>> Not at all. Assume we have a full build which is doing portability then 3
>> concurrent builds (go, python, java)
>> then we have "current step" in the CI but the dev are never affected by
>> that and the build does not mess up their machines as well.
>>
>
> I agree that no matter what, builds should not be messing up people's
> machines. (I hope they're not; if they are we should jump on fixing that
> right away.)
>

Go still create symb links on themselve which is broken in some env. Never
checked why and forcing a clean is a workaround.



>
>
> Today the main blocker is that default "profile" (script) is not matching
>> dev persona and therefore there is no real hope to have external
>> contributions
>> outside google related guys as mentionned by previous ficgures which is
>> sad for a project promishing unification and work between communities IMHO.
>>
>
> Trying to span different communities, especially those as diverse as those
> from thee Java, Python, and Go (and hopefully others) ecosystems, is
> nontrivial; one must span different expectations, workflows, "dev
> personas," etc. This may require some comprise from all but I am hopeful it
> will be minimal (e.g.there's some files in my repo and artifacts I had to
> build once when I built the world but it just worked and I don't look at
> them...) But it's clear from the other thread that we need to fix the Java
> IDE experience, and possibly other things too, because it's not working out
> for everyone as well as it could.
>

So short term we go to "profiles" skipping modules?


Then we just need solebody who can tackle idea integration (1. Import 2.
Test without gradle runner) issues soon. These are the most urgent blockers
and if fixed the language things can be more minor perhaps.



>
>
>>
>>>
>>> -Max
>>>
>>> On 10.10.18 10:51, Romain Manni-Bucau wrote:
>>> > Yep for the split
>>> >
>>> > For the clean point it is quite linked to the build tools and fake env
>>> > for not native modules for the build tool (go for gradle which is java
>>> > first for instance). This is why having a real build which is natural
>>> > per language would be beneficial IMO.
>>> >
>>> > Le mer. 10 oct. 2018 11:38, Jean-Baptiste Onofré <jb@nanthrax.net
>>> > <ma...@nanthrax.net>> a écrit :
>>> >
>>> >     Correct, it's more "module splitting" than repositories indeed.
>>> >
>>> >     Regards
>>> >     JB
>>> >
>>> >     On 10/10/2018 10:35, Robert Bradshaw wrote:
>>> >      > Gotcha. So this is more about dividing the code (particularly
>>> >     core) into
>>> >      > finer modules, rather than splitting the modules into separate
>>> >      > repositories, right?
>>> >      >
>>> >      > On Wed, Oct 10, 2018 at 10:29 AM Jean-Baptiste Onofré
>>> >     <jb@nanthrax.net <ma...@nanthrax.net>
>>> >      > <mailto:jb@nanthrax.net <ma...@nanthrax.net>>> wrote:
>>> >      >
>>> >      >     The purpose is that we have a monolithic core today mostly
>>> >     providing
>>> >      >     abstract classes.
>>> >      >
>>> >      >     The idea is to have something more API oriented with
>>> >     interface/SPI.
>>> >      >
>>> >      >     Our users would then be able to pick the part of the core
>>> >     they want,
>>> >      >     resulting with lighter artifacts, and for us, it gives a
>>> more
>>> >     flexible
>>> >      >     approach.
>>> >      >
>>> >      >     Regards
>>> >      >     JB
>>> >      >
>>> >      >     On 10/10/2018 10:26, Robert Bradshaw wrote:
>>> >      >     > My question was not whether we should split the repo, but
>>> why?
>>> >      >     (Dividing
>>> >      >     > things into more (or fewer) modules withing a single repo
>>> is a
>>> >      >     separate
>>> >      >     > question.) Maybe I'm just not following what you mean by
>>> >     "more API
>>> >      >     > oriented." It would force stabler APIs.
>>> >      >     >
>>> >      >     > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré
>>> >      >     <jb@nanthrax.net <ma...@nanthrax.net>
>>> >     <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
>>> >      >     > <mailto:jb@nanthrax.net <ma...@nanthrax.net>
>>> >     <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>> wrote:
>>> >      >     >
>>> >      >     >     Hi,
>>> >      >     >
>>> >      >     >     +1, even I think we could split the core even deeper.
>>> >      >     >
>>> >      >     >     I discussed with Luke and Reuven to introduce
>>> core-sql,
>>> >      >     core-schema,
>>> >      >     >     core-sdf, ...
>>> >      >     >
>>> >      >     >     It's not a huge effort, and would allow us to move
>>> >     forward on
>>> >      >     Beam "more
>>> >      >     >     API oriented" approach.
>>> >      >     >
>>> >      >     >     Regards
>>> >      >     >     JB
>>> >      >     >
>>> >      >     >     On 10/10/2018 10:12, Robert Bradshaw wrote:
>>> >      >     >     > Hi everyone,
>>> >      >     >     >
>>> >      >     >     > While IMHO it's too early to even be able to split
>>> >     the repo,
>>> >      >     it's
>>> >      >     >     not to
>>> >      >     >     > early to talk about it, and I wanted to spin this
>>> off to
>>> >      >     keep the
>>> >      >     >     other
>>> >      >     >     > thread focused.
>>> >      >     >     >
>>> >      >     >     > In particular, I am trying to figure out exactly
>>> what is
>>> >      >     hoped to be
>>> >      >     >     > gained by splitting things up. In my experience, a
>>> single
>>> >      >     project that
>>> >      >     >     > spans multiple repos has always come with excessive
>>> >     overhead
>>> >      >     and pain.
>>> >      >     >     > Of note, we recently merged the website and
>>> >     dataflow-worker
>>> >      >     into the
>>> >      >     >     > main repo *exactly* to avoid this pain (though the
>>> >     latter was
>>> >      >     >     > particularly bad due to one of the repos being
>>> private).
>>> >      >     >     >
>>> >      >     >     > If need be, I don't see any reason we can't have a
>>> single
>>> >      >     repo with
>>> >      >     >     > directories
>>> >      >     >     >
>>> >      >     >     > model/
>>> >      >     >     > website/
>>> >      >     >     > java/
>>> >      >     >     > go/
>>> >      >     >     > ...
>>> >      >     >     >
>>> >      >     >     > possibly even with their own build system (unified
>>> only
>>> >      >     through a
>>> >      >     >     > top-level "build everything" script that descends
>>> >     into each
>>> >      >     subdir and
>>> >      >     >     > runs the appropriate command). I'm not saying we
>>> >     should do
>>> >      >     this (there
>>> >      >     >     > is value in having a single consistent build system,
>>> >     etc.)
>>> >      >     but it's
>>> >      >     >     > possible. We could probably even make separate
>>> >     releases out
>>> >      >     of this
>>> >      >     >     > single repo (if we wanted, though given that our
>>> >     releases are
>>> >      >     >     time-based
>>> >      >     >     > rather than feature-based, I don't see much
>>> advantage
>>> >     here).
>>> >      >     >     >
>>> >      >     >     > Also, there was the comment.
>>> >      >     >     >
>>> >      >     >     > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
>>> >      >     >     > <rmannibucau@gmail.com <mailto:
>>> rmannibucau@gmail.com>
>>> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>
>>> >      >     <mailto:rmannibucau@gmail.com <mailto:rmannibucau@gmail.com
>>> >
>>> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>>
>>> >      >     >     <mailto:rmannibucau@gmail.com
>>> >     <ma...@gmail.com> <mailto:rmannibucau@gmail.com
>>> >     <ma...@gmail.com>>
>>> >      >     <mailto:rmannibucau@gmail.com <mailto:rmannibucau@gmail.com
>>> >
>>> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>>>>
>>> wrote:
>>> >      >     >     >>
>>> >      >     >     >> Side note: beam portability would be saner if added
>>> >     on top
>>> >      >     of others
>>> >      >     >     > than the opposite which is done today.
>>> >      >     >     >
>>> >      >     >     > I think you brought this up before, Romain. I'm
>>> still
>>> >     trying to
>>> >      >     >     wrap my
>>> >      >     >     > head around what you mean here. Could you elaborate
>>> >     what such a
>>> >      >     >     > structure would look like?
>>> >      >     >
>>> >      >     >     --
>>> >      >     >     Jean-Baptiste Onofré
>>> >      >     > jbonofre@apache.org <ma...@apache.org>
>>> >     <mailto:jbonofre@apache.org <ma...@apache.org>>
>>> >      >     <mailto:jbonofre@apache.org <ma...@apache.org>
>>> >     <mailto:jbonofre@apache.org <ma...@apache.org>>>
>>> >      >     > http://blog.nanthrax.net
>>> >      >     >     Talend - http://www.talend.com
>>> >      >     >
>>> >      >
>>> >      >     --
>>> >      >     Jean-Baptiste Onofré
>>> >      > jbonofre@apache.org <ma...@apache.org>
>>> >     <mailto:jbonofre@apache.org <ma...@apache.org>>
>>> >      > http://blog.nanthrax.net
>>> >      >     Talend - http://www.talend.com
>>> >      >
>>> >
>>> >     --
>>> >     Jean-Baptiste Onofré
>>> >     jbonofre@apache.org <ma...@apache.org>
>>> >     http://blog.nanthrax.net
>>> >     Talend - http://www.talend.com
>>> >
>>>
>>

Re: Splitting the repo

Posted by Robert Bradshaw <ro...@google.com>.
On Wed, Oct 10, 2018, 4:56 PM Romain Manni-Bucau <rm...@gmail.com>
wrote:

>
>
>
> Le mer. 10 oct. 2018 à 14:59, Maximilian Michels <mx...@apache.org> a
> écrit :
>
>> Hi,
>>
>> I agree that splitting up Beam into separate repositories would cause
>> more pain than gain.
>>
>> To a large degree we already have independent modules, e.g. runners/* or
>> sdks/*. Although this is not the case for the core. It would be
>> desirable to break it up further.
>>
>
> Think this part is ok for everyone.
>
>
>>
>>  > possibly even with their own build system (unified only through a
>>  > top-level "build everything" script that descends into each subdir and
>>  > runs the appropriate command).
>>
>> This is almost what we have. Yes, there are some dependencies on the
>> Beam Gradle Plugin, but even if we had completely independent build
>> directories, you'd still want to have a shared config/tasks across the
>> projects (which might bring you back to a setup similar to what we have).
>>
>> One of the pain points seems to be the portability which "polluted" some
>> parts of the project (e.g. legacy Runners). As mentioned in this thread
>> that could have been solved with an abstraction. But the lack of
>> abstraction also forced us to adopt the portable pipeline code quicker.
>>
>
> Not at all. Assume we have a full build which is doing portability then 3
> concurrent builds (go, python, java)
> then we have "current step" in the CI but the dev are never affected by
> that and the build does not mess up their machines as well.
>

I agree that no matter what, builds should not be messing up people's
machines. (I hope they're not; if they are we should jump on fixing that
right away.)


Today the main blocker is that default "profile" (script) is not matching
> dev persona and therefore there is no real hope to have external
> contributions
> outside google related guys as mentionned by previous ficgures which is
> sad for a project promishing unification and work between communities IMHO.
>

Trying to span different communities, especially those as diverse as those
from thee Java, Python, and Go (and hopefully others) ecosystems, is
nontrivial; one must span different expectations, workflows, "dev
personas," etc. This may require some comprise from all but I am hopeful it
will be minimal (e.g.there's some files in my repo and artifacts I had to
build once when I built the world but it just worked and I don't look at
them...) But it's clear from the other thread that we need to fix the Java
IDE experience, and possibly other things too, because it's not working out
for everyone as well as it could.



>
>>
>> -Max
>>
>> On 10.10.18 10:51, Romain Manni-Bucau wrote:
>> > Yep for the split
>> >
>> > For the clean point it is quite linked to the build tools and fake env
>> > for not native modules for the build tool (go for gradle which is java
>> > first for instance). This is why having a real build which is natural
>> > per language would be beneficial IMO.
>> >
>> > Le mer. 10 oct. 2018 11:38, Jean-Baptiste Onofré <jb@nanthrax.net
>> > <ma...@nanthrax.net>> a écrit :
>> >
>> >     Correct, it's more "module splitting" than repositories indeed.
>> >
>> >     Regards
>> >     JB
>> >
>> >     On 10/10/2018 10:35, Robert Bradshaw wrote:
>> >      > Gotcha. So this is more about dividing the code (particularly
>> >     core) into
>> >      > finer modules, rather than splitting the modules into separate
>> >      > repositories, right?
>> >      >
>> >      > On Wed, Oct 10, 2018 at 10:29 AM Jean-Baptiste Onofré
>> >     <jb@nanthrax.net <ma...@nanthrax.net>
>> >      > <mailto:jb@nanthrax.net <ma...@nanthrax.net>>> wrote:
>> >      >
>> >      >     The purpose is that we have a monolithic core today mostly
>> >     providing
>> >      >     abstract classes.
>> >      >
>> >      >     The idea is to have something more API oriented with
>> >     interface/SPI.
>> >      >
>> >      >     Our users would then be able to pick the part of the core
>> >     they want,
>> >      >     resulting with lighter artifacts, and for us, it gives a more
>> >     flexible
>> >      >     approach.
>> >      >
>> >      >     Regards
>> >      >     JB
>> >      >
>> >      >     On 10/10/2018 10:26, Robert Bradshaw wrote:
>> >      >     > My question was not whether we should split the repo, but
>> why?
>> >      >     (Dividing
>> >      >     > things into more (or fewer) modules withing a single repo
>> is a
>> >      >     separate
>> >      >     > question.) Maybe I'm just not following what you mean by
>> >     "more API
>> >      >     > oriented." It would force stabler APIs.
>> >      >     >
>> >      >     > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré
>> >      >     <jb@nanthrax.net <ma...@nanthrax.net>
>> >     <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
>> >      >     > <mailto:jb@nanthrax.net <ma...@nanthrax.net>
>> >     <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>> wrote:
>> >      >     >
>> >      >     >     Hi,
>> >      >     >
>> >      >     >     +1, even I think we could split the core even deeper.
>> >      >     >
>> >      >     >     I discussed with Luke and Reuven to introduce core-sql,
>> >      >     core-schema,
>> >      >     >     core-sdf, ...
>> >      >     >
>> >      >     >     It's not a huge effort, and would allow us to move
>> >     forward on
>> >      >     Beam "more
>> >      >     >     API oriented" approach.
>> >      >     >
>> >      >     >     Regards
>> >      >     >     JB
>> >      >     >
>> >      >     >     On 10/10/2018 10:12, Robert Bradshaw wrote:
>> >      >     >     > Hi everyone,
>> >      >     >     >
>> >      >     >     > While IMHO it's too early to even be able to split
>> >     the repo,
>> >      >     it's
>> >      >     >     not to
>> >      >     >     > early to talk about it, and I wanted to spin this
>> off to
>> >      >     keep the
>> >      >     >     other
>> >      >     >     > thread focused.
>> >      >     >     >
>> >      >     >     > In particular, I am trying to figure out exactly
>> what is
>> >      >     hoped to be
>> >      >     >     > gained by splitting things up. In my experience, a
>> single
>> >      >     project that
>> >      >     >     > spans multiple repos has always come with excessive
>> >     overhead
>> >      >     and pain.
>> >      >     >     > Of note, we recently merged the website and
>> >     dataflow-worker
>> >      >     into the
>> >      >     >     > main repo *exactly* to avoid this pain (though the
>> >     latter was
>> >      >     >     > particularly bad due to one of the repos being
>> private).
>> >      >     >     >
>> >      >     >     > If need be, I don't see any reason we can't have a
>> single
>> >      >     repo with
>> >      >     >     > directories
>> >      >     >     >
>> >      >     >     > model/
>> >      >     >     > website/
>> >      >     >     > java/
>> >      >     >     > go/
>> >      >     >     > ...
>> >      >     >     >
>> >      >     >     > possibly even with their own build system (unified
>> only
>> >      >     through a
>> >      >     >     > top-level "build everything" script that descends
>> >     into each
>> >      >     subdir and
>> >      >     >     > runs the appropriate command). I'm not saying we
>> >     should do
>> >      >     this (there
>> >      >     >     > is value in having a single consistent build system,
>> >     etc.)
>> >      >     but it's
>> >      >     >     > possible. We could probably even make separate
>> >     releases out
>> >      >     of this
>> >      >     >     > single repo (if we wanted, though given that our
>> >     releases are
>> >      >     >     time-based
>> >      >     >     > rather than feature-based, I don't see much advantage
>> >     here).
>> >      >     >     >
>> >      >     >     > Also, there was the comment.
>> >      >     >     >
>> >      >     >     > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
>> >      >     >     > <rmannibucau@gmail.com <mailto:rmannibucau@gmail.com
>> >
>> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>
>> >      >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>
>> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>>
>> >      >     >     <mailto:rmannibucau@gmail.com
>> >     <ma...@gmail.com> <mailto:rmannibucau@gmail.com
>> >     <ma...@gmail.com>>
>> >      >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>
>> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>>>>
>> wrote:
>> >      >     >     >>
>> >      >     >     >> Side note: beam portability would be saner if added
>> >     on top
>> >      >     of others
>> >      >     >     > than the opposite which is done today.
>> >      >     >     >
>> >      >     >     > I think you brought this up before, Romain. I'm still
>> >     trying to
>> >      >     >     wrap my
>> >      >     >     > head around what you mean here. Could you elaborate
>> >     what such a
>> >      >     >     > structure would look like?
>> >      >     >
>> >      >     >     --
>> >      >     >     Jean-Baptiste Onofré
>> >      >     > jbonofre@apache.org <ma...@apache.org>
>> >     <mailto:jbonofre@apache.org <ma...@apache.org>>
>> >      >     <mailto:jbonofre@apache.org <ma...@apache.org>
>> >     <mailto:jbonofre@apache.org <ma...@apache.org>>>
>> >      >     > http://blog.nanthrax.net
>> >      >     >     Talend - http://www.talend.com
>> >      >     >
>> >      >
>> >      >     --
>> >      >     Jean-Baptiste Onofré
>> >      > jbonofre@apache.org <ma...@apache.org>
>> >     <mailto:jbonofre@apache.org <ma...@apache.org>>
>> >      > http://blog.nanthrax.net
>> >      >     Talend - http://www.talend.com
>> >      >
>> >
>> >     --
>> >     Jean-Baptiste Onofré
>> >     jbonofre@apache.org <ma...@apache.org>
>> >     http://blog.nanthrax.net
>> >     Talend - http://www.talend.com
>> >
>>
>

Re: Splitting the repo

Posted by Romain Manni-Bucau <rm...@gmail.com>.
Le mer. 10 oct. 2018 à 14:59, Maximilian Michels <mx...@apache.org> a écrit :

> Hi,
>
> I agree that splitting up Beam into separate repositories would cause
> more pain than gain.
>
> To a large degree we already have independent modules, e.g. runners/* or
> sdks/*. Although this is not the case for the core. It would be
> desirable to break it up further.
>

Think this part is ok for everyone.


>
>  > possibly even with their own build system (unified only through a
>  > top-level "build everything" script that descends into each subdir and
>  > runs the appropriate command).
>
> This is almost what we have. Yes, there are some dependencies on the
> Beam Gradle Plugin, but even if we had completely independent build
> directories, you'd still want to have a shared config/tasks across the
> projects (which might bring you back to a setup similar to what we have).
>
> One of the pain points seems to be the portability which "polluted" some
> parts of the project (e.g. legacy Runners). As mentioned in this thread
> that could have been solved with an abstraction. But the lack of
> abstraction also forced us to adopt the portable pipeline code quicker.
>

Not at all. Assume we have a full build which is doing portability then 3
concurrent builds (go, python, java)
then we have "current step" in the CI but the dev are never affected by
that and the build does not mess up their machines as well.

Today the main blocker is that default "profile" (script) is not matching
dev persona and therefore there is no real hope to have external
contributions
outside google related guys as mentionned by previous ficgures which is sad
for a project promishing unification and work between communities IMHO.


>
> -Max
>
> On 10.10.18 10:51, Romain Manni-Bucau wrote:
> > Yep for the split
> >
> > For the clean point it is quite linked to the build tools and fake env
> > for not native modules for the build tool (go for gradle which is java
> > first for instance). This is why having a real build which is natural
> > per language would be beneficial IMO.
> >
> > Le mer. 10 oct. 2018 11:38, Jean-Baptiste Onofré <jb@nanthrax.net
> > <ma...@nanthrax.net>> a écrit :
> >
> >     Correct, it's more "module splitting" than repositories indeed.
> >
> >     Regards
> >     JB
> >
> >     On 10/10/2018 10:35, Robert Bradshaw wrote:
> >      > Gotcha. So this is more about dividing the code (particularly
> >     core) into
> >      > finer modules, rather than splitting the modules into separate
> >      > repositories, right?
> >      >
> >      > On Wed, Oct 10, 2018 at 10:29 AM Jean-Baptiste Onofré
> >     <jb@nanthrax.net <ma...@nanthrax.net>
> >      > <mailto:jb@nanthrax.net <ma...@nanthrax.net>>> wrote:
> >      >
> >      >     The purpose is that we have a monolithic core today mostly
> >     providing
> >      >     abstract classes.
> >      >
> >      >     The idea is to have something more API oriented with
> >     interface/SPI.
> >      >
> >      >     Our users would then be able to pick the part of the core
> >     they want,
> >      >     resulting with lighter artifacts, and for us, it gives a more
> >     flexible
> >      >     approach.
> >      >
> >      >     Regards
> >      >     JB
> >      >
> >      >     On 10/10/2018 10:26, Robert Bradshaw wrote:
> >      >     > My question was not whether we should split the repo, but
> why?
> >      >     (Dividing
> >      >     > things into more (or fewer) modules withing a single repo
> is a
> >      >     separate
> >      >     > question.) Maybe I'm just not following what you mean by
> >     "more API
> >      >     > oriented." It would force stabler APIs.
> >      >     >
> >      >     > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré
> >      >     <jb@nanthrax.net <ma...@nanthrax.net>
> >     <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
> >      >     > <mailto:jb@nanthrax.net <ma...@nanthrax.net>
> >     <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>> wrote:
> >      >     >
> >      >     >     Hi,
> >      >     >
> >      >     >     +1, even I think we could split the core even deeper.
> >      >     >
> >      >     >     I discussed with Luke and Reuven to introduce core-sql,
> >      >     core-schema,
> >      >     >     core-sdf, ...
> >      >     >
> >      >     >     It's not a huge effort, and would allow us to move
> >     forward on
> >      >     Beam "more
> >      >     >     API oriented" approach.
> >      >     >
> >      >     >     Regards
> >      >     >     JB
> >      >     >
> >      >     >     On 10/10/2018 10:12, Robert Bradshaw wrote:
> >      >     >     > Hi everyone,
> >      >     >     >
> >      >     >     > While IMHO it's too early to even be able to split
> >     the repo,
> >      >     it's
> >      >     >     not to
> >      >     >     > early to talk about it, and I wanted to spin this off
> to
> >      >     keep the
> >      >     >     other
> >      >     >     > thread focused.
> >      >     >     >
> >      >     >     > In particular, I am trying to figure out exactly what
> is
> >      >     hoped to be
> >      >     >     > gained by splitting things up. In my experience, a
> single
> >      >     project that
> >      >     >     > spans multiple repos has always come with excessive
> >     overhead
> >      >     and pain.
> >      >     >     > Of note, we recently merged the website and
> >     dataflow-worker
> >      >     into the
> >      >     >     > main repo *exactly* to avoid this pain (though the
> >     latter was
> >      >     >     > particularly bad due to one of the repos being
> private).
> >      >     >     >
> >      >     >     > If need be, I don't see any reason we can't have a
> single
> >      >     repo with
> >      >     >     > directories
> >      >     >     >
> >      >     >     > model/
> >      >     >     > website/
> >      >     >     > java/
> >      >     >     > go/
> >      >     >     > ...
> >      >     >     >
> >      >     >     > possibly even with their own build system (unified
> only
> >      >     through a
> >      >     >     > top-level "build everything" script that descends
> >     into each
> >      >     subdir and
> >      >     >     > runs the appropriate command). I'm not saying we
> >     should do
> >      >     this (there
> >      >     >     > is value in having a single consistent build system,
> >     etc.)
> >      >     but it's
> >      >     >     > possible. We could probably even make separate
> >     releases out
> >      >     of this
> >      >     >     > single repo (if we wanted, though given that our
> >     releases are
> >      >     >     time-based
> >      >     >     > rather than feature-based, I don't see much advantage
> >     here).
> >      >     >     >
> >      >     >     > Also, there was the comment.
> >      >     >     >
> >      >     >     > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
> >      >     >     > <rmannibucau@gmail.com <ma...@gmail.com>
> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>
> >      >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>
> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>>
> >      >     >     <mailto:rmannibucau@gmail.com
> >     <ma...@gmail.com> <mailto:rmannibucau@gmail.com
> >     <ma...@gmail.com>>
> >      >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>
> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>>>>
> wrote:
> >      >     >     >>
> >      >     >     >> Side note: beam portability would be saner if added
> >     on top
> >      >     of others
> >      >     >     > than the opposite which is done today.
> >      >     >     >
> >      >     >     > I think you brought this up before, Romain. I'm still
> >     trying to
> >      >     >     wrap my
> >      >     >     > head around what you mean here. Could you elaborate
> >     what such a
> >      >     >     > structure would look like?
> >      >     >
> >      >     >     --
> >      >     >     Jean-Baptiste Onofré
> >      >     > jbonofre@apache.org <ma...@apache.org>
> >     <mailto:jbonofre@apache.org <ma...@apache.org>>
> >      >     <mailto:jbonofre@apache.org <ma...@apache.org>
> >     <mailto:jbonofre@apache.org <ma...@apache.org>>>
> >      >     > http://blog.nanthrax.net
> >      >     >     Talend - http://www.talend.com
> >      >     >
> >      >
> >      >     --
> >      >     Jean-Baptiste Onofré
> >      > jbonofre@apache.org <ma...@apache.org>
> >     <mailto:jbonofre@apache.org <ma...@apache.org>>
> >      > http://blog.nanthrax.net
> >      >     Talend - http://www.talend.com
> >      >
> >
> >     --
> >     Jean-Baptiste Onofré
> >     jbonofre@apache.org <ma...@apache.org>
> >     http://blog.nanthrax.net
> >     Talend - http://www.talend.com
> >
>

Re: Splitting the repo

Posted by Maximilian Michels <mx...@apache.org>.
Hi,

I agree that splitting up Beam into separate repositories would cause 
more pain than gain.

To a large degree we already have independent modules, e.g. runners/* or 
sdks/*. Although this is not the case for the core. It would be 
desirable to break it up further.

 > possibly even with their own build system (unified only through a
 > top-level "build everything" script that descends into each subdir and
 > runs the appropriate command).

This is almost what we have. Yes, there are some dependencies on the 
Beam Gradle Plugin, but even if we had completely independent build 
directories, you'd still want to have a shared config/tasks across the 
projects (which might bring you back to a setup similar to what we have).

One of the pain points seems to be the portability which "polluted" some 
parts of the project (e.g. legacy Runners). As mentioned in this thread 
that could have been solved with an abstraction. But the lack of 
abstraction also forced us to adopt the portable pipeline code quicker.

-Max

On 10.10.18 10:51, Romain Manni-Bucau wrote:
> Yep for the split
> 
> For the clean point it is quite linked to the build tools and fake env 
> for not native modules for the build tool (go for gradle which is java 
> first for instance). This is why having a real build which is natural 
> per language would be beneficial IMO.
> 
> Le mer. 10 oct. 2018 11:38, Jean-Baptiste Onofré <jb@nanthrax.net 
> <ma...@nanthrax.net>> a écrit :
> 
>     Correct, it's more "module splitting" than repositories indeed.
> 
>     Regards
>     JB
> 
>     On 10/10/2018 10:35, Robert Bradshaw wrote:
>      > Gotcha. So this is more about dividing the code (particularly
>     core) into
>      > finer modules, rather than splitting the modules into separate
>      > repositories, right?
>      >
>      > On Wed, Oct 10, 2018 at 10:29 AM Jean-Baptiste Onofré
>     <jb@nanthrax.net <ma...@nanthrax.net>
>      > <mailto:jb@nanthrax.net <ma...@nanthrax.net>>> wrote:
>      >
>      >     The purpose is that we have a monolithic core today mostly
>     providing
>      >     abstract classes.
>      >
>      >     The idea is to have something more API oriented with
>     interface/SPI.
>      >
>      >     Our users would then be able to pick the part of the core
>     they want,
>      >     resulting with lighter artifacts, and for us, it gives a more
>     flexible
>      >     approach.
>      >
>      >     Regards
>      >     JB
>      >
>      >     On 10/10/2018 10:26, Robert Bradshaw wrote:
>      >     > My question was not whether we should split the repo, but why?
>      >     (Dividing
>      >     > things into more (or fewer) modules withing a single repo is a
>      >     separate
>      >     > question.) Maybe I'm just not following what you mean by
>     "more API
>      >     > oriented." It would force stabler APIs.
>      >     >
>      >     > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré
>      >     <jb@nanthrax.net <ma...@nanthrax.net>
>     <mailto:jb@nanthrax.net <ma...@nanthrax.net>>
>      >     > <mailto:jb@nanthrax.net <ma...@nanthrax.net>
>     <mailto:jb@nanthrax.net <ma...@nanthrax.net>>>> wrote:
>      >     >
>      >     >     Hi,
>      >     >
>      >     >     +1, even I think we could split the core even deeper.
>      >     >
>      >     >     I discussed with Luke and Reuven to introduce core-sql,
>      >     core-schema,
>      >     >     core-sdf, ...
>      >     >
>      >     >     It's not a huge effort, and would allow us to move
>     forward on
>      >     Beam "more
>      >     >     API oriented" approach.
>      >     >
>      >     >     Regards
>      >     >     JB
>      >     >
>      >     >     On 10/10/2018 10:12, Robert Bradshaw wrote:
>      >     >     > Hi everyone,
>      >     >     >
>      >     >     > While IMHO it's too early to even be able to split
>     the repo,
>      >     it's
>      >     >     not to
>      >     >     > early to talk about it, and I wanted to spin this off to
>      >     keep the
>      >     >     other
>      >     >     > thread focused.
>      >     >     >
>      >     >     > In particular, I am trying to figure out exactly what is
>      >     hoped to be
>      >     >     > gained by splitting things up. In my experience, a single
>      >     project that
>      >     >     > spans multiple repos has always come with excessive
>     overhead
>      >     and pain.
>      >     >     > Of note, we recently merged the website and
>     dataflow-worker
>      >     into the
>      >     >     > main repo *exactly* to avoid this pain (though the
>     latter was
>      >     >     > particularly bad due to one of the repos being private).
>      >     >     >
>      >     >     > If need be, I don't see any reason we can't have a single
>      >     repo with
>      >     >     > directories
>      >     >     >
>      >     >     > model/
>      >     >     > website/
>      >     >     > java/
>      >     >     > go/
>      >     >     > ...
>      >     >     >
>      >     >     > possibly even with their own build system (unified only
>      >     through a
>      >     >     > top-level "build everything" script that descends
>     into each
>      >     subdir and
>      >     >     > runs the appropriate command). I'm not saying we
>     should do
>      >     this (there
>      >     >     > is value in having a single consistent build system,
>     etc.)
>      >     but it's
>      >     >     > possible. We could probably even make separate
>     releases out
>      >     of this
>      >     >     > single repo (if we wanted, though given that our
>     releases are
>      >     >     time-based
>      >     >     > rather than feature-based, I don't see much advantage
>     here).
>      >     >     >
>      >     >     > Also, there was the comment.
>      >     >     >
>      >     >     > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
>      >     >     > <rmannibucau@gmail.com <ma...@gmail.com>
>     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>
>      >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>
>     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>>
>      >     >     <mailto:rmannibucau@gmail.com
>     <ma...@gmail.com> <mailto:rmannibucau@gmail.com
>     <ma...@gmail.com>>
>      >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>
>     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>>>> wrote:
>      >     >     >>
>      >     >     >> Side note: beam portability would be saner if added
>     on top
>      >     of others
>      >     >     > than the opposite which is done today.
>      >     >     >
>      >     >     > I think you brought this up before, Romain. I'm still
>     trying to
>      >     >     wrap my
>      >     >     > head around what you mean here. Could you elaborate
>     what such a
>      >     >     > structure would look like?
>      >     >
>      >     >     --
>      >     >     Jean-Baptiste Onofré
>      >     > jbonofre@apache.org <ma...@apache.org>
>     <mailto:jbonofre@apache.org <ma...@apache.org>>
>      >     <mailto:jbonofre@apache.org <ma...@apache.org>
>     <mailto:jbonofre@apache.org <ma...@apache.org>>>
>      >     > http://blog.nanthrax.net
>      >     >     Talend - http://www.talend.com
>      >     >
>      >
>      >     --
>      >     Jean-Baptiste Onofré
>      > jbonofre@apache.org <ma...@apache.org>
>     <mailto:jbonofre@apache.org <ma...@apache.org>>
>      > http://blog.nanthrax.net
>      >     Talend - http://www.talend.com
>      >
> 
>     -- 
>     Jean-Baptiste Onofré
>     jbonofre@apache.org <ma...@apache.org>
>     http://blog.nanthrax.net
>     Talend - http://www.talend.com
> 

Re: Splitting the repo

Posted by Romain Manni-Bucau <rm...@gmail.com>.
Yep for the split

For the clean point it is quite linked to the build tools and fake env for
not native modules for the build tool (go for gradle which is java first
for instance). This is why having a real build which is natural per
language would be beneficial IMO.

Le mer. 10 oct. 2018 11:38, Jean-Baptiste Onofré <jb...@nanthrax.net> a écrit :

> Correct, it's more "module splitting" than repositories indeed.
>
> Regards
> JB
>
> On 10/10/2018 10:35, Robert Bradshaw wrote:
> > Gotcha. So this is more about dividing the code (particularly core) into
> > finer modules, rather than splitting the modules into separate
> > repositories, right?
> >
> > On Wed, Oct 10, 2018 at 10:29 AM Jean-Baptiste Onofré <jb@nanthrax.net
> > <ma...@nanthrax.net>> wrote:
> >
> >     The purpose is that we have a monolithic core today mostly providing
> >     abstract classes.
> >
> >     The idea is to have something more API oriented with interface/SPI.
> >
> >     Our users would then be able to pick the part of the core they want,
> >     resulting with lighter artifacts, and for us, it gives a more
> flexible
> >     approach.
> >
> >     Regards
> >     JB
> >
> >     On 10/10/2018 10:26, Robert Bradshaw wrote:
> >     > My question was not whether we should split the repo, but why?
> >     (Dividing
> >     > things into more (or fewer) modules withing a single repo is a
> >     separate
> >     > question.) Maybe I'm just not following what you mean by "more API
> >     > oriented." It would force stabler APIs.
> >     >
> >     > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré
> >     <jb@nanthrax.net <ma...@nanthrax.net>
> >     > <mailto:jb@nanthrax.net <ma...@nanthrax.net>>> wrote:
> >     >
> >     >     Hi,
> >     >
> >     >     +1, even I think we could split the core even deeper.
> >     >
> >     >     I discussed with Luke and Reuven to introduce core-sql,
> >     core-schema,
> >     >     core-sdf, ...
> >     >
> >     >     It's not a huge effort, and would allow us to move forward on
> >     Beam "more
> >     >     API oriented" approach.
> >     >
> >     >     Regards
> >     >     JB
> >     >
> >     >     On 10/10/2018 10:12, Robert Bradshaw wrote:
> >     >     > Hi everyone,
> >     >     >
> >     >     > While IMHO it's too early to even be able to split the repo,
> >     it's
> >     >     not to
> >     >     > early to talk about it, and I wanted to spin this off to
> >     keep the
> >     >     other
> >     >     > thread focused.
> >     >     >
> >     >     > In particular, I am trying to figure out exactly what is
> >     hoped to be
> >     >     > gained by splitting things up. In my experience, a single
> >     project that
> >     >     > spans multiple repos has always come with excessive overhead
> >     and pain.
> >     >     > Of note, we recently merged the website and dataflow-worker
> >     into the
> >     >     > main repo *exactly* to avoid this pain (though the latter was
> >     >     > particularly bad due to one of the repos being private).
> >     >     >
> >     >     > If need be, I don't see any reason we can't have a single
> >     repo with
> >     >     > directories
> >     >     >
> >     >     > model/
> >     >     > website/
> >     >     > java/
> >     >     > go/
> >     >     > ...
> >     >     >
> >     >     > possibly even with their own build system (unified only
> >     through a
> >     >     > top-level "build everything" script that descends into each
> >     subdir and
> >     >     > runs the appropriate command). I'm not saying we should do
> >     this (there
> >     >     > is value in having a single consistent build system, etc.)
> >     but it's
> >     >     > possible. We could probably even make separate releases out
> >     of this
> >     >     > single repo (if we wanted, though given that our releases are
> >     >     time-based
> >     >     > rather than feature-based, I don't see much advantage here).
> >     >     >
> >     >     > Also, there was the comment.
> >     >     >
> >     >     > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
> >     >     > <rmannibucau@gmail.com <ma...@gmail.com>
> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>
> >     >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>
> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>>>
> wrote:
> >     >     >>
> >     >     >> Side note: beam portability would be saner if added on top
> >     of others
> >     >     > than the opposite which is done today.
> >     >     >
> >     >     > I think you brought this up before, Romain. I'm still trying
> to
> >     >     wrap my
> >     >     > head around what you mean here. Could you elaborate what
> such a
> >     >     > structure would look like?
> >     >
> >     >     --
> >     >     Jean-Baptiste Onofré
> >     >     jbonofre@apache.org <ma...@apache.org>
> >     <mailto:jbonofre@apache.org <ma...@apache.org>>
> >     >     http://blog.nanthrax.net
> >     >     Talend - http://www.talend.com
> >     >
> >
> >     --
> >     Jean-Baptiste Onofré
> >     jbonofre@apache.org <ma...@apache.org>
> >     http://blog.nanthrax.net
> >     Talend - http://www.talend.com
> >
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Splitting the repo

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Correct, it's more "module splitting" than repositories indeed.

Regards
JB

On 10/10/2018 10:35, Robert Bradshaw wrote:
> Gotcha. So this is more about dividing the code (particularly core) into
> finer modules, rather than splitting the modules into separate
> repositories, right? 
> 
> On Wed, Oct 10, 2018 at 10:29 AM Jean-Baptiste Onofré <jb@nanthrax.net
> <ma...@nanthrax.net>> wrote:
> 
>     The purpose is that we have a monolithic core today mostly providing
>     abstract classes.
> 
>     The idea is to have something more API oriented with interface/SPI.
> 
>     Our users would then be able to pick the part of the core they want,
>     resulting with lighter artifacts, and for us, it gives a more flexible
>     approach.
> 
>     Regards
>     JB
> 
>     On 10/10/2018 10:26, Robert Bradshaw wrote:
>     > My question was not whether we should split the repo, but why?
>     (Dividing
>     > things into more (or fewer) modules withing a single repo is a
>     separate
>     > question.) Maybe I'm just not following what you mean by "more API
>     > oriented." It would force stabler APIs. 
>     >
>     > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré
>     <jb@nanthrax.net <ma...@nanthrax.net>
>     > <mailto:jb@nanthrax.net <ma...@nanthrax.net>>> wrote:
>     >
>     >     Hi,
>     >
>     >     +1, even I think we could split the core even deeper.
>     >
>     >     I discussed with Luke and Reuven to introduce core-sql,
>     core-schema,
>     >     core-sdf, ...
>     >
>     >     It's not a huge effort, and would allow us to move forward on
>     Beam "more
>     >     API oriented" approach.
>     >
>     >     Regards
>     >     JB
>     >
>     >     On 10/10/2018 10:12, Robert Bradshaw wrote:
>     >     > Hi everyone,
>     >     >
>     >     > While IMHO it's too early to even be able to split the repo,
>     it's
>     >     not to
>     >     > early to talk about it, and I wanted to spin this off to
>     keep the
>     >     other
>     >     > thread focused.
>     >     >
>     >     > In particular, I am trying to figure out exactly what is
>     hoped to be
>     >     > gained by splitting things up. In my experience, a single
>     project that
>     >     > spans multiple repos has always come with excessive overhead
>     and pain.
>     >     > Of note, we recently merged the website and dataflow-worker
>     into the
>     >     > main repo *exactly* to avoid this pain (though the latter was
>     >     > particularly bad due to one of the repos being private).
>     >     >
>     >     > If need be, I don't see any reason we can't have a single
>     repo with
>     >     > directories
>     >     >
>     >     > model/
>     >     > website/
>     >     > java/
>     >     > go/
>     >     > ...
>     >     >
>     >     > possibly even with their own build system (unified only
>     through a
>     >     > top-level "build everything" script that descends into each
>     subdir and
>     >     > runs the appropriate command). I'm not saying we should do
>     this (there
>     >     > is value in having a single consistent build system, etc.)
>     but it's
>     >     > possible. We could probably even make separate releases out
>     of this
>     >     > single repo (if we wanted, though given that our releases are
>     >     time-based
>     >     > rather than feature-based, I don't see much advantage here).
>     >     >
>     >     > Also, there was the comment.
>     >     >
>     >     > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
>     >     > <rmannibucau@gmail.com <ma...@gmail.com>
>     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>
>     >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>
>     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>>> wrote:
>     >     >>
>     >     >> Side note: beam portability would be saner if added on top
>     of others
>     >     > than the opposite which is done today.
>     >     >
>     >     > I think you brought this up before, Romain. I'm still trying to
>     >     wrap my
>     >     > head around what you mean here. Could you elaborate what such a
>     >     > structure would look like? 
>     >
>     >     --
>     >     Jean-Baptiste Onofré
>     >     jbonofre@apache.org <ma...@apache.org>
>     <mailto:jbonofre@apache.org <ma...@apache.org>>
>     >     http://blog.nanthrax.net
>     >     Talend - http://www.talend.com
>     >
> 
>     -- 
>     Jean-Baptiste Onofré
>     jbonofre@apache.org <ma...@apache.org>
>     http://blog.nanthrax.net
>     Talend - http://www.talend.com
> 

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Splitting the repo

Posted by Robert Bradshaw <ro...@google.com>.
Gotcha. So this is more about dividing the code (particularly core) into
finer modules, rather than splitting the modules into separate
repositories, right?

On Wed, Oct 10, 2018 at 10:29 AM Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> The purpose is that we have a monolithic core today mostly providing
> abstract classes.
>
> The idea is to have something more API oriented with interface/SPI.
>
> Our users would then be able to pick the part of the core they want,
> resulting with lighter artifacts, and for us, it gives a more flexible
> approach.
>
> Regards
> JB
>
> On 10/10/2018 10:26, Robert Bradshaw wrote:
> > My question was not whether we should split the repo, but why? (Dividing
> > things into more (or fewer) modules withing a single repo is a separate
> > question.) Maybe I'm just not following what you mean by "more API
> > oriented." It would force stabler APIs.
> >
> > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré <jb@nanthrax.net
> > <ma...@nanthrax.net>> wrote:
> >
> >     Hi,
> >
> >     +1, even I think we could split the core even deeper.
> >
> >     I discussed with Luke and Reuven to introduce core-sql, core-schema,
> >     core-sdf, ...
> >
> >     It's not a huge effort, and would allow us to move forward on Beam
> "more
> >     API oriented" approach.
> >
> >     Regards
> >     JB
> >
> >     On 10/10/2018 10:12, Robert Bradshaw wrote:
> >     > Hi everyone,
> >     >
> >     > While IMHO it's too early to even be able to split the repo, it's
> >     not to
> >     > early to talk about it, and I wanted to spin this off to keep the
> >     other
> >     > thread focused.
> >     >
> >     > In particular, I am trying to figure out exactly what is hoped to
> be
> >     > gained by splitting things up. In my experience, a single project
> that
> >     > spans multiple repos has always come with excessive overhead and
> pain.
> >     > Of note, we recently merged the website and dataflow-worker into
> the
> >     > main repo *exactly* to avoid this pain (though the latter was
> >     > particularly bad due to one of the repos being private).
> >     >
> >     > If need be, I don't see any reason we can't have a single repo with
> >     > directories
> >     >
> >     > model/
> >     > website/
> >     > java/
> >     > go/
> >     > ...
> >     >
> >     > possibly even with their own build system (unified only through a
> >     > top-level "build everything" script that descends into each subdir
> and
> >     > runs the appropriate command). I'm not saying we should do this
> (there
> >     > is value in having a single consistent build system, etc.) but it's
> >     > possible. We could probably even make separate releases out of this
> >     > single repo (if we wanted, though given that our releases are
> >     time-based
> >     > rather than feature-based, I don't see much advantage here).
> >     >
> >     > Also, there was the comment.
> >     >
> >     > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
> >     > <rmannibucau@gmail.com <ma...@gmail.com>
> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>>
> wrote:
> >     >>
> >     >> Side note: beam portability would be saner if added on top of
> others
> >     > than the opposite which is done today.
> >     >
> >     > I think you brought this up before, Romain. I'm still trying to
> >     wrap my
> >     > head around what you mean here. Could you elaborate what such a
> >     > structure would look like?
> >
> >     --
> >     Jean-Baptiste Onofré
> >     jbonofre@apache.org <ma...@apache.org>
> >     http://blog.nanthrax.net
> >     Talend - http://www.talend.com
> >
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Splitting the repo

Posted by Robert Bradshaw <ro...@google.com>.
On Wed, Oct 10, 2018 at 10:35 AM Romain Manni-Bucau <rm...@gmail.com>
wrote:

> Also we can get a more adapted build tool by area and not break the repo
> for each build. Go and python build always need a git clean for java users
> which is a big issue so let's build each subproject - that is what beam is
> today - as they should with an adapted tool.
>

If this is the case, that should be fixed. I can't remember the last time I
did a git clean, so clearly things are not working as well for you as for
I.


> It requires very few validations byt it is trivial to add unit tests to
> ensure it is not broken on these contact points.
>
> Le mer. 10 oct. 2018 11:29, Jean-Baptiste Onofré <jb...@nanthrax.net> a
> écrit :
>
>> The purpose is that we have a monolithic core today mostly providing
>> abstract classes.
>>
>> The idea is to have something more API oriented with interface/SPI.
>>
>> Our users would then be able to pick the part of the core they want,
>> resulting with lighter artifacts, and for us, it gives a more flexible
>> approach.
>>
>> Regards
>> JB
>>
>> On 10/10/2018 10:26, Robert Bradshaw wrote:
>> > My question was not whether we should split the repo, but why? (Dividing
>> > things into more (or fewer) modules withing a single repo is a separate
>> > question.) Maybe I'm just not following what you mean by "more API
>> > oriented." It would force stabler APIs.
>> >
>> > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré <jb@nanthrax.net
>> > <ma...@nanthrax.net>> wrote:
>> >
>> >     Hi,
>> >
>> >     +1, even I think we could split the core even deeper.
>> >
>> >     I discussed with Luke and Reuven to introduce core-sql, core-schema,
>> >     core-sdf, ...
>> >
>> >     It's not a huge effort, and would allow us to move forward on Beam
>> "more
>> >     API oriented" approach.
>> >
>> >     Regards
>> >     JB
>> >
>> >     On 10/10/2018 10:12, Robert Bradshaw wrote:
>> >     > Hi everyone,
>> >     >
>> >     > While IMHO it's too early to even be able to split the repo, it's
>> >     not to
>> >     > early to talk about it, and I wanted to spin this off to keep the
>> >     other
>> >     > thread focused.
>> >     >
>> >     > In particular, I am trying to figure out exactly what is hoped to
>> be
>> >     > gained by splitting things up. In my experience, a single project
>> that
>> >     > spans multiple repos has always come with excessive overhead and
>> pain.
>> >     > Of note, we recently merged the website and dataflow-worker into
>> the
>> >     > main repo *exactly* to avoid this pain (though the latter was
>> >     > particularly bad due to one of the repos being private).
>> >     >
>> >     > If need be, I don't see any reason we can't have a single repo
>> with
>> >     > directories
>> >     >
>> >     > model/
>> >     > website/
>> >     > java/
>> >     > go/
>> >     > ...
>> >     >
>> >     > possibly even with their own build system (unified only through a
>> >     > top-level "build everything" script that descends into each
>> subdir and
>> >     > runs the appropriate command). I'm not saying we should do this
>> (there
>> >     > is value in having a single consistent build system, etc.) but
>> it's
>> >     > possible. We could probably even make separate releases out of
>> this
>> >     > single repo (if we wanted, though given that our releases are
>> >     time-based
>> >     > rather than feature-based, I don't see much advantage here).
>> >     >
>> >     > Also, there was the comment.
>> >     >
>> >     > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
>> >     > <rmannibucau@gmail.com <ma...@gmail.com>
>> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>>
>> wrote:
>> >     >>
>> >     >> Side note: beam portability would be saner if added on top of
>> others
>> >     > than the opposite which is done today.
>> >     >
>> >     > I think you brought this up before, Romain. I'm still trying to
>> >     wrap my
>> >     > head around what you mean here. Could you elaborate what such a
>> >     > structure would look like?
>> >
>> >     --
>> >     Jean-Baptiste Onofré
>> >     jbonofre@apache.org <ma...@apache.org>
>> >     http://blog.nanthrax.net
>> >     Talend - http://www.talend.com
>> >
>>
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>

Re: Splitting the repo

Posted by Romain Manni-Bucau <rm...@gmail.com>.
Also we can get a more adapted build tool by area and not break the repo
for each build. Go and python build always need a git clean for java users
which is a big issue so let's build each subproject - that is what beam is
today - as they should with an adapted tool.

It requires very few validations byt it is trivial to add unit tests to
ensure it is not broken on these contact points.

Le mer. 10 oct. 2018 11:29, Jean-Baptiste Onofré <jb...@nanthrax.net> a écrit :

> The purpose is that we have a monolithic core today mostly providing
> abstract classes.
>
> The idea is to have something more API oriented with interface/SPI.
>
> Our users would then be able to pick the part of the core they want,
> resulting with lighter artifacts, and for us, it gives a more flexible
> approach.
>
> Regards
> JB
>
> On 10/10/2018 10:26, Robert Bradshaw wrote:
> > My question was not whether we should split the repo, but why? (Dividing
> > things into more (or fewer) modules withing a single repo is a separate
> > question.) Maybe I'm just not following what you mean by "more API
> > oriented." It would force stabler APIs.
> >
> > On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré <jb@nanthrax.net
> > <ma...@nanthrax.net>> wrote:
> >
> >     Hi,
> >
> >     +1, even I think we could split the core even deeper.
> >
> >     I discussed with Luke and Reuven to introduce core-sql, core-schema,
> >     core-sdf, ...
> >
> >     It's not a huge effort, and would allow us to move forward on Beam
> "more
> >     API oriented" approach.
> >
> >     Regards
> >     JB
> >
> >     On 10/10/2018 10:12, Robert Bradshaw wrote:
> >     > Hi everyone,
> >     >
> >     > While IMHO it's too early to even be able to split the repo, it's
> >     not to
> >     > early to talk about it, and I wanted to spin this off to keep the
> >     other
> >     > thread focused.
> >     >
> >     > In particular, I am trying to figure out exactly what is hoped to
> be
> >     > gained by splitting things up. In my experience, a single project
> that
> >     > spans multiple repos has always come with excessive overhead and
> pain.
> >     > Of note, we recently merged the website and dataflow-worker into
> the
> >     > main repo *exactly* to avoid this pain (though the latter was
> >     > particularly bad due to one of the repos being private).
> >     >
> >     > If need be, I don't see any reason we can't have a single repo with
> >     > directories
> >     >
> >     > model/
> >     > website/
> >     > java/
> >     > go/
> >     > ...
> >     >
> >     > possibly even with their own build system (unified only through a
> >     > top-level "build everything" script that descends into each subdir
> and
> >     > runs the appropriate command). I'm not saying we should do this
> (there
> >     > is value in having a single consistent build system, etc.) but it's
> >     > possible. We could probably even make separate releases out of this
> >     > single repo (if we wanted, though given that our releases are
> >     time-based
> >     > rather than feature-based, I don't see much advantage here).
> >     >
> >     > Also, there was the comment.
> >     >
> >     > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
> >     > <rmannibucau@gmail.com <ma...@gmail.com>
> >     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>>
> wrote:
> >     >>
> >     >> Side note: beam portability would be saner if added on top of
> others
> >     > than the opposite which is done today.
> >     >
> >     > I think you brought this up before, Romain. I'm still trying to
> >     wrap my
> >     > head around what you mean here. Could you elaborate what such a
> >     > structure would look like?
> >
> >     --
> >     Jean-Baptiste Onofré
> >     jbonofre@apache.org <ma...@apache.org>
> >     http://blog.nanthrax.net
> >     Talend - http://www.talend.com
> >
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Splitting the repo

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
The purpose is that we have a monolithic core today mostly providing
abstract classes.

The idea is to have something more API oriented with interface/SPI.

Our users would then be able to pick the part of the core they want,
resulting with lighter artifacts, and for us, it gives a more flexible
approach.

Regards
JB

On 10/10/2018 10:26, Robert Bradshaw wrote:
> My question was not whether we should split the repo, but why? (Dividing
> things into more (or fewer) modules withing a single repo is a separate
> question.) Maybe I'm just not following what you mean by "more API
> oriented." It would force stabler APIs. 
> 
> On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré <jb@nanthrax.net
> <ma...@nanthrax.net>> wrote:
> 
>     Hi,
> 
>     +1, even I think we could split the core even deeper.
> 
>     I discussed with Luke and Reuven to introduce core-sql, core-schema,
>     core-sdf, ...
> 
>     It's not a huge effort, and would allow us to move forward on Beam "more
>     API oriented" approach.
> 
>     Regards
>     JB
> 
>     On 10/10/2018 10:12, Robert Bradshaw wrote:
>     > Hi everyone,
>     >
>     > While IMHO it's too early to even be able to split the repo, it's
>     not to
>     > early to talk about it, and I wanted to spin this off to keep the
>     other
>     > thread focused.
>     >
>     > In particular, I am trying to figure out exactly what is hoped to be
>     > gained by splitting things up. In my experience, a single project that
>     > spans multiple repos has always come with excessive overhead and pain.
>     > Of note, we recently merged the website and dataflow-worker into the
>     > main repo *exactly* to avoid this pain (though the latter was
>     > particularly bad due to one of the repos being private).
>     >
>     > If need be, I don't see any reason we can't have a single repo with
>     > directories
>     >
>     > model/
>     > website/
>     > java/
>     > go/
>     > ...
>     >
>     > possibly even with their own build system (unified only through a
>     > top-level "build everything" script that descends into each subdir and
>     > runs the appropriate command). I'm not saying we should do this (there
>     > is value in having a single consistent build system, etc.) but it's
>     > possible. We could probably even make separate releases out of this
>     > single repo (if we wanted, though given that our releases are
>     time-based
>     > rather than feature-based, I don't see much advantage here).
>     >
>     > Also, there was the comment.
>     >
>     > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
>     > <rmannibucau@gmail.com <ma...@gmail.com>
>     <mailto:rmannibucau@gmail.com <ma...@gmail.com>>> wrote:
>     >>
>     >> Side note: beam portability would be saner if added on top of others
>     > than the opposite which is done today.
>     >
>     > I think you brought this up before, Romain. I'm still trying to
>     wrap my
>     > head around what you mean here. Could you elaborate what such a
>     > structure would look like? 
> 
>     -- 
>     Jean-Baptiste Onofré
>     jbonofre@apache.org <ma...@apache.org>
>     http://blog.nanthrax.net
>     Talend - http://www.talend.com
> 

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Splitting the repo

Posted by Robert Bradshaw <ro...@google.com>.
My question was not whether we should split the repo, but why? (Dividing
things into more (or fewer) modules withing a single repo is a separate
question.) Maybe I'm just not following what you mean by "more API
oriented." It would force stabler APIs.

On Wed, Oct 10, 2018 at 10:18 AM Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Hi,
>
> +1, even I think we could split the core even deeper.
>
> I discussed with Luke and Reuven to introduce core-sql, core-schema,
> core-sdf, ...
>
> It's not a huge effort, and would allow us to move forward on Beam "more
> API oriented" approach.
>
> Regards
> JB
>
> On 10/10/2018 10:12, Robert Bradshaw wrote:
> > Hi everyone,
> >
> > While IMHO it's too early to even be able to split the repo, it's not to
> > early to talk about it, and I wanted to spin this off to keep the other
> > thread focused.
> >
> > In particular, I am trying to figure out exactly what is hoped to be
> > gained by splitting things up. In my experience, a single project that
> > spans multiple repos has always come with excessive overhead and pain.
> > Of note, we recently merged the website and dataflow-worker into the
> > main repo *exactly* to avoid this pain (though the latter was
> > particularly bad due to one of the repos being private).
> >
> > If need be, I don't see any reason we can't have a single repo with
> > directories
> >
> > model/
> > website/
> > java/
> > go/
> > ...
> >
> > possibly even with their own build system (unified only through a
> > top-level "build everything" script that descends into each subdir and
> > runs the appropriate command). I'm not saying we should do this (there
> > is value in having a single consistent build system, etc.) but it's
> > possible. We could probably even make separate releases out of this
> > single repo (if we wanted, though given that our releases are time-based
> > rather than feature-based, I don't see much advantage here).
> >
> > Also, there was the comment.
> >
> > On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
> > <rmannibucau@gmail.com <ma...@gmail.com>> wrote:
> >>
> >> Side note: beam portability would be saner if added on top of others
> > than the opposite which is done today.
> >
> > I think you brought this up before, Romain. I'm still trying to wrap my
> > head around what you mean here. Could you elaborate what such a
> > structure would look like?
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Splitting the repo

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi,

+1, even I think we could split the core even deeper.

I discussed with Luke and Reuven to introduce core-sql, core-schema,
core-sdf, ...

It's not a huge effort, and would allow us to move forward on Beam "more
API oriented" approach.

Regards
JB

On 10/10/2018 10:12, Robert Bradshaw wrote:
> Hi everyone,
> 
> While IMHO it's too early to even be able to split the repo, it's not to
> early to talk about it, and I wanted to spin this off to keep the other
> thread focused.
> 
> In particular, I am trying to figure out exactly what is hoped to be
> gained by splitting things up. In my experience, a single project that
> spans multiple repos has always come with excessive overhead and pain.
> Of note, we recently merged the website and dataflow-worker into the
> main repo *exactly* to avoid this pain (though the latter was
> particularly bad due to one of the repos being private).
> 
> If need be, I don't see any reason we can't have a single repo with
> directories
> 
> model/
> website/
> java/
> go/
> ...
> 
> possibly even with their own build system (unified only through a
> top-level "build everything" script that descends into each subdir and
> runs the appropriate command). I'm not saying we should do this (there
> is value in having a single consistent build system, etc.) but it's
> possible. We could probably even make separate releases out of this
> single repo (if we wanted, though given that our releases are time-based
> rather than feature-based, I don't see much advantage here).
> 
> Also, there was the comment.
> 
> On Wed, Oct 10, 2018 at 7:35 AM Romain Manni-Bucau
> <rmannibucau@gmail.com <ma...@gmail.com>> wrote:
>>
>> Side note: beam portability would be saner if added on top of others
> than the opposite which is done today.
> 
> I think you brought this up before, Romain. I'm still trying to wrap my
> head around what you mean here. Could you elaborate what such a
> structure would look like? 

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com