You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@predictionio.apache.org by Simon Chan <si...@salesforce.com> on 2016/11/03 21:16:10 UTC

consolidate repos into 1

Hi guys,

I'm actually thinking we should consolidate all core templates / SDKs repos
that are donated to Apache (i.e.
https://github.com/search?q=org%3Aapache+PredictionIO) into one main repo (
https://github.com/apache/incubator-predictionio)

The benefit may be that:
1. We can track Apache PredictionIO project activity in a unified place;
2. Making these templates part of the main repo encourages contributors to
make sure they are all compatible with the latest version of PredictionIO
core;
3. I don't see other projects (e.g. Mahout and its libraries) hosting core
and components separately.

Thought?

Simon

Re: consolidate repos into 1

Posted by Kenneth Chan <ke...@apache.org>.
sorry if too late into this discussion.

actually i also thought about if make sense to merge the official
templates into one repo too.

the reason is
- think it like library or a bundle of PIO. everytime we releae new version
of PIO, we also need to make sure these template work or upgraded together.
jist like spark and spark mllib.
- easier to maintain as well?


for the concern of large text classifcation template size. the reason it s
big because od there was a time a bunch of binary jar were checked in. i
think should recreate that repo and clean that up and include instruction
of how to get those jars.


On Thu, Nov 3, 2016 at 3:34 PM Pat Ferrel <pa...@occamsmachete.com> wrote:

> I wouldn’t favor merging for Tom’s point and others:
> So far from the template I maintain, there have been 2 PIO releases and
> soon to be 7 template releases. The point being that active templates will
> have their own revision schedule. You have only to look at the history of
> the templates to see that they are released independent of PIO releases.
> ASF tools make it hard, not the project needs.
> These were all separate repos in PIO days because they made sense as
> separate and because Github makes it easy. Now with ASF hosted git there is
> more pain but still the same project needs. Let’s not confuse pain with
> need. Let’s remove the pain points. We already have self-service repo
> creation from pushing on the pain points, a big step forward from the days
> when it took an infra-ticket to get a repo.
> If `git pull template-url` is the basis of getting a template, merging
> repos will break this and make contributed templates different than
> external ones to the confusion of users.
> As Tom noted It will also bloat the project when we’d like to see it more
> modular. For instance an Admin server microservice may also end up in a
> separate repo so it can be released at different intervals.
> The standard IMO is not Apache, which is a venerable institution (trying
> to remove friction points), it is outside-apache OSS which most assuredly
> is more modular. Pip, npm, gems, apt-get, ...
>
> Growth leads to bloat or efforts to decouple and refactor. I’d actually
> like to see PIO split up along mircorservices refactoring lines but all in
> time. A move to bundle together seems the wrong direction.
>
> Another problem is the difficulty of binary releases in ASF as we all
> witnessed (especially hard for incubating projects). Think about the fact
> that currently templates do not need to be released in any sense. Wow, that
> is very cool, speaking from the ASF red-tape avoidance part of me.
>
>
> On Nov 3, 2016, at 2:41 PM, Tom Chan <yu...@gmail.com> wrote:
>
> This is mostly a good idea but then one of the templates is 3 times the
> size of incubator-predictionio:
>
> $ du -d 1  -h
> 53M ./incubator-predictionio
> 1.3M ./incubator-predictionio-sdk-java
> 288K ./incubator-predictionio-sdk-php
> 536K ./incubator-predictionio-sdk-python
> 264K ./incubator-predictionio-sdk-ruby
> 236K ./incubator-predictionio-template-attribute-based-classifier
> 220K ./incubator-predictionio-template-ecom-recommender
> 264K ./incubator-predictionio-template-java-ecom-recommender
> 184K ./incubator-predictionio-template-recommender
> 196K ./incubator-predictionio-template-similar-product
> 440K ./incubator-predictionio-template-skeleton
> 160M ./incubator-predictionio-template-text-classifier
>
> This 160M will be downloaded by all users regardless of whether they use it
> or not, if we choose to consolidate them all into one repo.
>
> Tom
>
> On Thu, Nov 3, 2016 at 2:16 PM, Simon Chan <si...@salesforce.com> wrote:
>
> > Hi guys,
> >
> > I'm actually thinking we should consolidate all core templates / SDKs
> repos
> > that are donated to Apache (i.e.
> > https://github.com/search?q=org%3Aapache+PredictionIO) into one main
> repo
> > (
> > https://github.com/apache/incubator-predictionio)
> >
> > The benefit may be that:
> > 1. We can track Apache PredictionIO project activity in a unified place;
> > 2. Making these templates part of the main repo encourages contributors
> to
> > make sure they are all compatible with the latest version of PredictionIO
> > core;
> > 3. I don't see other projects (e.g. Mahout and its libraries) hosting
> core
> > and components separately.
> >
> > Thought?
> >
> > Simon
> >
>
>

Re: consolidate repos into 1

Posted by Pat Ferrel <pa...@occamsmachete.com>.
I wouldn’t favor merging for Tom’s point and others:
So far from the template I maintain, there have been 2 PIO releases and soon to be 7 template releases. The point being that active templates will have their own revision schedule. You have only to look at the history of the templates to see that they are released independent of PIO releases. ASF tools make it hard, not the project needs. 
These were all separate repos in PIO days because they made sense as separate and because Github makes it easy. Now with ASF hosted git there is more pain but still the same project needs. Let’s not confuse pain with need. Let’s remove the pain points. We already have self-service repo creation from pushing on the pain points, a big step forward from the days when it took an infra-ticket to get a repo.
If `git pull template-url` is the basis of getting a template, merging repos will break this and make contributed templates different than external ones to the confusion of users.
As Tom noted It will also bloat the project when we’d like to see it more modular. For instance an Admin server microservice may also end up in a separate repo so it can be released at different intervals. 
The standard IMO is not Apache, which is a venerable institution (trying to remove friction points), it is outside-apache OSS which most assuredly is more modular. Pip, npm, gems, apt-get, ...

Growth leads to bloat or efforts to decouple and refactor. I’d actually like to see PIO split up along mircorservices refactoring lines but all in time. A move to bundle together seems the wrong direction.

Another problem is the difficulty of binary releases in ASF as we all witnessed (especially hard for incubating projects). Think about the fact that currently templates do not need to be released in any sense. Wow, that is very cool, speaking from the ASF red-tape avoidance part of me.  


On Nov 3, 2016, at 2:41 PM, Tom Chan <yu...@gmail.com> wrote:

This is mostly a good idea but then one of the templates is 3 times the
size of incubator-predictionio:

$ du -d 1  -h
53M ./incubator-predictionio
1.3M ./incubator-predictionio-sdk-java
288K ./incubator-predictionio-sdk-php
536K ./incubator-predictionio-sdk-python
264K ./incubator-predictionio-sdk-ruby
236K ./incubator-predictionio-template-attribute-based-classifier
220K ./incubator-predictionio-template-ecom-recommender
264K ./incubator-predictionio-template-java-ecom-recommender
184K ./incubator-predictionio-template-recommender
196K ./incubator-predictionio-template-similar-product
440K ./incubator-predictionio-template-skeleton
160M ./incubator-predictionio-template-text-classifier

This 160M will be downloaded by all users regardless of whether they use it
or not, if we choose to consolidate them all into one repo.

Tom

On Thu, Nov 3, 2016 at 2:16 PM, Simon Chan <si...@salesforce.com> wrote:

> Hi guys,
> 
> I'm actually thinking we should consolidate all core templates / SDKs repos
> that are donated to Apache (i.e.
> https://github.com/search?q=org%3Aapache+PredictionIO) into one main repo
> (
> https://github.com/apache/incubator-predictionio)
> 
> The benefit may be that:
> 1. We can track Apache PredictionIO project activity in a unified place;
> 2. Making these templates part of the main repo encourages contributors to
> make sure they are all compatible with the latest version of PredictionIO
> core;
> 3. I don't see other projects (e.g. Mahout and its libraries) hosting core
> and components separately.
> 
> Thought?
> 
> Simon
> 


Re: consolidate repos into 1

Posted by Tom Chan <yu...@gmail.com>.
This is mostly a good idea but then one of the templates is 3 times the
size of incubator-predictionio:

$ du -d 1  -h
 53M ./incubator-predictionio
1.3M ./incubator-predictionio-sdk-java
288K ./incubator-predictionio-sdk-php
536K ./incubator-predictionio-sdk-python
264K ./incubator-predictionio-sdk-ruby
236K ./incubator-predictionio-template-attribute-based-classifier
220K ./incubator-predictionio-template-ecom-recommender
264K ./incubator-predictionio-template-java-ecom-recommender
184K ./incubator-predictionio-template-recommender
196K ./incubator-predictionio-template-similar-product
440K ./incubator-predictionio-template-skeleton
160M ./incubator-predictionio-template-text-classifier

This 160M will be downloaded by all users regardless of whether they use it
or not, if we choose to consolidate them all into one repo.

Tom

On Thu, Nov 3, 2016 at 2:16 PM, Simon Chan <si...@salesforce.com> wrote:

> Hi guys,
>
> I'm actually thinking we should consolidate all core templates / SDKs repos
> that are donated to Apache (i.e.
> https://github.com/search?q=org%3Aapache+PredictionIO) into one main repo
> (
> https://github.com/apache/incubator-predictionio)
>
> The benefit may be that:
> 1. We can track Apache PredictionIO project activity in a unified place;
> 2. Making these templates part of the main repo encourages contributors to
> make sure they are all compatible with the latest version of PredictionIO
> core;
> 3. I don't see other projects (e.g. Mahout and its libraries) hosting core
> and components separately.
>
> Thought?
>
> Simon
>