You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Alan Gates <ga...@hortonworks.com> on 2014/08/15 19:24:48 UTC

Question on providing CDH packages

Let me begin by noting that I obviously have a conflict of interest 
since my company is a direct competitor to Cloudera.  But as a mentor 
and Apache member I believe I need to bring this up.

What is the Apache policy towards having a vendor specific package on a 
download site?  It is strange to me to come to Flink's website and see 
packages for Flink with CDH (or HDP or MapR or whatever).  We should 
avoid providing vendor specific packages.  It gives the appearance of 
preferring one vendor over another, which Apache does not want to do.

I have no problem at all with Cloudera hosting a CDH specific package of 
Flink, nor with Flink project members working with Cloudera to create 
such a package.  But I do not think they should be hosted at Apache.

Alan.
-- 
Sent with Postbox <http://www.getpostbox.com>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Question on providing CDH packages

Posted by Henry Saputra <he...@gmail.com>.
Agree with Robert,

ASF only releases source code. So the binary packages is just
convenience from Flink that targeted specific Hadoop vendors.

If you look at Apache Spark download page [1], they also do the same
thing by providing distro specific binaries.

AFAIK this should NOT be a problem and especially should not block the release.

Thanks,

Henry

[1] http://spark.apache.org/downloads.html

On Fri, Aug 15, 2014 at 11:28 AM, Robert Metzger <rm...@apache.org> wrote:
> Hi,
>
> I'm glad you've brought this topic up. (Thank you also for checking the
> release!).
> I've used Spark's release script as a reference for creating ours (why
> reinventing the wheel, they have excellent infrastructure), and they had a
> CDH4 profile, so I thought its okay for Apache projects to have these
> special builds.
>
> Let me explain the technical background for this: (I hope all information
> here is correct, correct me if I'm wrong)
> There are two components inside Flink that have dependencies to Hadoop a)
> HDFS and b) YARN.
>
> Usually, users who have a Hadoop versions like 0.2x or 1.x can use our
> "hadoop1" builds. They contain the hadoop1 HDFS client and no YARN support.
> Users with old CDH versions (I guess pre 4), Hortonworks or MapR can also
> use these builds.
> For users that have newer vendor distributions (HDP2, CDH5, ...) can use
> our "hadoop2" build. It contains the newer HDFS client (protobuf-based RPC)
> and have support for the new YARN API (2.2.0 onwards).
> So the "hadoop1" and "hadoop2" builds cover probably most of the cases
> users have.
> Then, there is CDH4, which contains a "unreleased" Hadoop 2.0.0 version. It
> has the new HDFS client (protobuf), but the old YARN API (2.1.0-beta or
> so), which we don't support. Therefore, users can not use the "hadoop1"
> build (wrong HDFS client) and the "hadoop2" build is not compatible with
> YARN.
>
> If you have a look at the Spark downloads page, you'll find the following
> (apache-hosted?) binary builds:
>
>
>
>    - For Hadoop 1 (HDP1, CDH3): find an Apache mirror
>    <http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-hadoop1.tgz>
>    or direct file download
>    <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-hadoop1.tgz>
>    - For CDH4: find an Apache mirror
>    <http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-cdh4.tgz>
>    or direct file download
>    <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-cdh4.tgz>
>    - For Hadoop 2 (HDP2, CDH5): find an Apache mirror
>    <http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-hadoop2.tgz>
>    or direct file download
>    <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-hadoop2.tgz>
>
>
> I think this choice of binaries reflects what I've explained above.
>
> I'm happy (if the others agree) to remove the cdh4 binary from the release
> and delay the discussion after the release.
>
> Best,
> Robert
>
>
>
>
> On Fri, Aug 15, 2014 at 8:01 PM, Owen O'Malley <om...@apache.org> wrote:
>
>> As a mentor, I agree that vendor specific packages aren't appropriate for
>> the Apache site. (Disclosure: I work at Hortonworks.) Working with the
>> vendors to make packages available is great, but they shouldn't be hosted
>> at Apache.
>>
>> .. Owen
>>
>>
>> On Fri, Aug 15, 2014 at 10:32 AM, Sean Owen <sr...@gmail.com> wrote:
>>
>> > I hope not surprisingly, I agree. (Backstory: I am at Cloudera.) I
>> > have for example lobbied Spark to remove CDH-specific releases and
>> > build profiles. Not just for this reason, but because it is often
>> > unnecessary to have vendor-specific builds, and also just increases
>> > maintenance overhead for the project.
>> >
>> > Matei et al say they want to make it as easy as possible to consume
>> > Spark, and so provide vendor-build-specific artifacts and such here
>> > and there. To be fair, Spark tries to support a large range of Hadoop
>> > and YARN versions, and getting the right combination of profiles and
>> > versions right to recreate a vendor release was kind of hard until
>> > about Hadoop 2.2 (stable YARN really).
>> >
>> > I haven't heard of any formal policy. I would ask whether there are
>> > similar reasons to produce pre-packaged releases like so?
>> >
>> >
>> > On Fri, Aug 15, 2014 at 6:24 PM, Alan Gates <ga...@hortonworks.com>
>> wrote:
>> > > Let me begin by noting that I obviously have a conflict of interest
>> > since my
>> > > company is a direct competitor to Cloudera.  But as a mentor and Apache
>> > > member I believe I need to bring this up.
>> > >
>> > > What is the Apache policy towards having a vendor specific package on a
>> > > download site?  It is strange to me to come to Flink's website and see
>> > > packages for Flink with CDH (or HDP or MapR or whatever).  We should
>> > avoid
>> > > providing vendor specific packages.  It gives the appearance of
>> > preferring
>> > > one vendor over another, which Apache does not want to do.
>> > >
>> > > I have no problem at all with Cloudera hosting a CDH specific package
>> of
>> > > Flink, nor with Flink project members working with Cloudera to create
>> > such a
>> > > package.  But I do not think they should be hosted at Apache.
>> > >
>> > > Alan.
>> > > --
>> > > Sent with Postbox <http://www.getpostbox.com>
>> > >
>> > > --
>> > > CONFIDENTIALITY NOTICE
>> > > NOTICE: This message is intended for the use of the individual or
>> entity
>> > to
>> > > which it is addressed and may contain information that is confidential,
>> > > privileged and exempt from disclosure under applicable law. If the
>> > reader of
>> > > this message is not the intended recipient, you are hereby notified
>> that
>> > any
>> > > printing, copying, dissemination, distribution, disclosure or
>> forwarding
>> > of
>> > > this communication is strictly prohibited. If you have received this
>> > > communication in error, please contact the sender immediately and
>> delete
>> > it
>> > > from your system. Thank You.
>> >
>>

Re: Question on providing CDH packages

Posted by Alan Gates <ga...@hortonworks.com>.
+1 to not holding the release on this.  Since  the release is only the 
source*, if we later decide that CDH specific packages are ok we can add 
them in with no extra votes, etc.

Alan.

*Apache releases only source code.   This is so that users, 
distributers, etc. can verify the integrity etc. of the code.  Binary 
packages are a convenience only for users who are willing to trust us 
without seeing the internals of the code themselves.

Alan.
> I'm happy (if the others agree) to remove the cdh4 binary from the release
> and delay the discussion after the release.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Question on providing CDH packages

Posted by Robert Metzger <rm...@apache.org>.
Hi,

I'm glad you've brought this topic up. (Thank you also for checking the
release!).
I've used Spark's release script as a reference for creating ours (why
reinventing the wheel, they have excellent infrastructure), and they had a
CDH4 profile, so I thought its okay for Apache projects to have these
special builds.

Let me explain the technical background for this: (I hope all information
here is correct, correct me if I'm wrong)
There are two components inside Flink that have dependencies to Hadoop a)
HDFS and b) YARN.

Usually, users who have a Hadoop versions like 0.2x or 1.x can use our
"hadoop1" builds. They contain the hadoop1 HDFS client and no YARN support.
Users with old CDH versions (I guess pre 4), Hortonworks or MapR can also
use these builds.
For users that have newer vendor distributions (HDP2, CDH5, ...) can use
our "hadoop2" build. It contains the newer HDFS client (protobuf-based RPC)
and have support for the new YARN API (2.2.0 onwards).
So the "hadoop1" and "hadoop2" builds cover probably most of the cases
users have.
Then, there is CDH4, which contains a "unreleased" Hadoop 2.0.0 version. It
has the new HDFS client (protobuf), but the old YARN API (2.1.0-beta or
so), which we don't support. Therefore, users can not use the "hadoop1"
build (wrong HDFS client) and the "hadoop2" build is not compatible with
YARN.

If you have a look at the Spark downloads page, you'll find the following
(apache-hosted?) binary builds:



   - For Hadoop 1 (HDP1, CDH3): find an Apache mirror
   <http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-hadoop1.tgz>
   or direct file download
   <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-hadoop1.tgz>
   - For CDH4: find an Apache mirror
   <http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-cdh4.tgz>
   or direct file download
   <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-cdh4.tgz>
   - For Hadoop 2 (HDP2, CDH5): find an Apache mirror
   <http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.2/spark-1.0.2-bin-hadoop2.tgz>
   or direct file download
   <http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-hadoop2.tgz>


I think this choice of binaries reflects what I've explained above.

I'm happy (if the others agree) to remove the cdh4 binary from the release
and delay the discussion after the release.

Best,
Robert




On Fri, Aug 15, 2014 at 8:01 PM, Owen O'Malley <om...@apache.org> wrote:

> As a mentor, I agree that vendor specific packages aren't appropriate for
> the Apache site. (Disclosure: I work at Hortonworks.) Working with the
> vendors to make packages available is great, but they shouldn't be hosted
> at Apache.
>
> .. Owen
>
>
> On Fri, Aug 15, 2014 at 10:32 AM, Sean Owen <sr...@gmail.com> wrote:
>
> > I hope not surprisingly, I agree. (Backstory: I am at Cloudera.) I
> > have for example lobbied Spark to remove CDH-specific releases and
> > build profiles. Not just for this reason, but because it is often
> > unnecessary to have vendor-specific builds, and also just increases
> > maintenance overhead for the project.
> >
> > Matei et al say they want to make it as easy as possible to consume
> > Spark, and so provide vendor-build-specific artifacts and such here
> > and there. To be fair, Spark tries to support a large range of Hadoop
> > and YARN versions, and getting the right combination of profiles and
> > versions right to recreate a vendor release was kind of hard until
> > about Hadoop 2.2 (stable YARN really).
> >
> > I haven't heard of any formal policy. I would ask whether there are
> > similar reasons to produce pre-packaged releases like so?
> >
> >
> > On Fri, Aug 15, 2014 at 6:24 PM, Alan Gates <ga...@hortonworks.com>
> wrote:
> > > Let me begin by noting that I obviously have a conflict of interest
> > since my
> > > company is a direct competitor to Cloudera.  But as a mentor and Apache
> > > member I believe I need to bring this up.
> > >
> > > What is the Apache policy towards having a vendor specific package on a
> > > download site?  It is strange to me to come to Flink's website and see
> > > packages for Flink with CDH (or HDP or MapR or whatever).  We should
> > avoid
> > > providing vendor specific packages.  It gives the appearance of
> > preferring
> > > one vendor over another, which Apache does not want to do.
> > >
> > > I have no problem at all with Cloudera hosting a CDH specific package
> of
> > > Flink, nor with Flink project members working with Cloudera to create
> > such a
> > > package.  But I do not think they should be hosted at Apache.
> > >
> > > Alan.
> > > --
> > > Sent with Postbox <http://www.getpostbox.com>
> > >
> > > --
> > > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
> > to
> > > which it is addressed and may contain information that is confidential,
> > > privileged and exempt from disclosure under applicable law. If the
> > reader of
> > > this message is not the intended recipient, you are hereby notified
> that
> > any
> > > printing, copying, dissemination, distribution, disclosure or
> forwarding
> > of
> > > this communication is strictly prohibited. If you have received this
> > > communication in error, please contact the sender immediately and
> delete
> > it
> > > from your system. Thank You.
> >
>

Re: Question on providing CDH packages

Posted by Robert Metzger <rm...@apache.org>.
Just for the record, I was able to build a Flink version that is compatible
to CDH4 by using the official Hadoop 2.0.0-alpha release. So with the next
release (0.6.1-incubating or 0.7-incubating) we can ship an additional
"hadoop200alpha" binary package that works for users using a distro which
is based on this version. It won't include any vendor specific versions or
binaries.
See also: https://issues.apache.org/jira/browse/FLINK-1068


On Tue, Aug 19, 2014 at 8:08 PM, Alan Gates <ga...@hortonworks.com> wrote:

> No objections.  That seems like a good way to help our users while avoiding
> the appearance of favoring one vendor over another.
>
> Alan.
>
>
> On Mon, Aug 18, 2014 at 10:43 AM, Stephan Ewen <se...@apache.org> wrote:
>
> > I like Sean's idea very much: Creating the three packages (Hadoop 1.x,
> > Hadoop 2.x, Hadoop 2.0 with Yarn beta).
> >
> > Any objections to creating a help site that says "For that vendor with
> this
> > version pick the following binary release" ?
> >
> > Stephan
> >
> >
> >
> > > >> On Mon, Aug 18, 2014 at 5:58 PM, Henry Saputra <
> > henry.saputra@gmail.com>
> > > wrote:
> > > >>> As for Flink, for now the additional CDH4 packaged binary is to
> > > >>> support "non-standard" Hadoop version that some customers may
> already
> > > >>> have.
> > > >>>
> > > >>> Based on "not a question of supporting a vendor but a Hadoop
> version
> > > >>> combo.", would the approach that Flink had done to help customers
> get
> > > >>> go and running quickly seemed fair and good idea?
> > > >>>
> > > >>> There had been a lot of discussion about ASF release artifacts and
> > the
> > > >>> consistent answer is that ASF validate release of source code and
> not
> > > >>> binaries.
> > > >>> Release of binaries only used to help customers, which is the case
> > > >>> that Flink is doing with different Hadoop versions.
> > > >>>
> > > >>> - Henry
> > > >>>
> > > >>> On Mon, Aug 18, 2014 at 9:51 AM, Sean Owen <sr...@gmail.com>
> wrote:
> > > >>>> It's probably the same thing as with Spark. Spark doesn't actually
> > > >>>> work with YARN 'beta'-era releases, but works 'stable' and
> specially
> > > >>>> supports 'alpha'. CDH 4.{2-4} or so == YARN 'beta' (not
> > non-standard,
> > > >>>> but, is probably the only distro of it you'll still run into in
> > > >>>> circulation). (And so it's kind of unhelpful that Spark has build
> > > >>>> instructions for CDH 4.2 + YARN.) Yeah, that's the thing you may
> > > >>>> handle as a corner case, or not handle and punt to the vendor. But
> > > >>>> even that -- if that's the same issue -- it's not a question of
> > > >>>> supporting a vendor but a Hadoop version combo.
> > > >>>>
> > > >>>> On Mon, Aug 18, 2014 at 5:43 PM, Stephan Ewen <se...@apache.org>
> > > wrote:
> > > >>>>> I think the main problem was that CDH4 is a non standard build.
> All
> > > others
> > > >>>>> we tried worked with  hadoop-1.2 and 2.2/2.4 builds.
> > > >>>>>
> > > >>>>> But I understand your points.
> > > >>>>>
> > > >>>>> So, instead of creating those packages, we can make a guide "how
> to
> > > pick
> > > >>>>> the right distribution", which points you to the hadoop-1.2 and
> > > 2.2/2.4
> > > >>>>> builds. For some cases, the guide will ask you to
> > "compile-your-own".
> > > >>>>>
> > >
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Question on providing CDH packages

Posted by Alan Gates <ga...@hortonworks.com>.
No objections.  That seems like a good way to help our users while avoiding
the appearance of favoring one vendor over another.

Alan.


On Mon, Aug 18, 2014 at 10:43 AM, Stephan Ewen <se...@apache.org> wrote:

> I like Sean's idea very much: Creating the three packages (Hadoop 1.x,
> Hadoop 2.x, Hadoop 2.0 with Yarn beta).
>
> Any objections to creating a help site that says "For that vendor with this
> version pick the following binary release" ?
>
> Stephan
>
>
>
> > >> On Mon, Aug 18, 2014 at 5:58 PM, Henry Saputra <
> henry.saputra@gmail.com>
> > wrote:
> > >>> As for Flink, for now the additional CDH4 packaged binary is to
> > >>> support "non-standard" Hadoop version that some customers may already
> > >>> have.
> > >>>
> > >>> Based on "not a question of supporting a vendor but a Hadoop version
> > >>> combo.", would the approach that Flink had done to help customers get
> > >>> go and running quickly seemed fair and good idea?
> > >>>
> > >>> There had been a lot of discussion about ASF release artifacts and
> the
> > >>> consistent answer is that ASF validate release of source code and not
> > >>> binaries.
> > >>> Release of binaries only used to help customers, which is the case
> > >>> that Flink is doing with different Hadoop versions.
> > >>>
> > >>> - Henry
> > >>>
> > >>> On Mon, Aug 18, 2014 at 9:51 AM, Sean Owen <sr...@gmail.com> wrote:
> > >>>> It's probably the same thing as with Spark. Spark doesn't actually
> > >>>> work with YARN 'beta'-era releases, but works 'stable' and specially
> > >>>> supports 'alpha'. CDH 4.{2-4} or so == YARN 'beta' (not
> non-standard,
> > >>>> but, is probably the only distro of it you'll still run into in
> > >>>> circulation). (And so it's kind of unhelpful that Spark has build
> > >>>> instructions for CDH 4.2 + YARN.) Yeah, that's the thing you may
> > >>>> handle as a corner case, or not handle and punt to the vendor. But
> > >>>> even that -- if that's the same issue -- it's not a question of
> > >>>> supporting a vendor but a Hadoop version combo.
> > >>>>
> > >>>> On Mon, Aug 18, 2014 at 5:43 PM, Stephan Ewen <se...@apache.org>
> > wrote:
> > >>>>> I think the main problem was that CDH4 is a non standard build. All
> > others
> > >>>>> we tried worked with  hadoop-1.2 and 2.2/2.4 builds.
> > >>>>>
> > >>>>> But I understand your points.
> > >>>>>
> > >>>>> So, instead of creating those packages, we can make a guide "how to
> > pick
> > >>>>> the right distribution", which points you to the hadoop-1.2 and
> > 2.2/2.4
> > >>>>> builds. For some cases, the guide will ask you to
> "compile-your-own".
> > >>>>>
> >
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Question on providing CDH packages

Posted by Robert Metzger <rm...@apache.org>.
Supporting the Hadoop 2.0 (not 2.2) YARN API would be a lot of coding
effort. There was a huge API change between the two versions.
Maybe we can find a technical solution to this political/legal problem: I'm
going to build and try a Flink version against the "2.1.1-beta" (or
similar) (official Apache Hadoop release) and see if that's working as well
with CDH4.
Then, we can provide a non-vendor specific binary that still solves the
problem for our users.
Our problem is not as severe as for Spark, since they have (in my
understanding) support for both YARN APIs. So our issue with the CDH4 /
Hadoop 2.1-beta is only related to the HDFS client, not the whole YARN API.



On Mon, Aug 18, 2014 at 7:43 PM, Stephan Ewen <se...@apache.org> wrote:

> I like Sean's idea very much: Creating the three packages (Hadoop 1.x,
> Hadoop 2.x, Hadoop 2.0 with Yarn beta).
>
> Any objections to creating a help site that says "For that vendor with this
> version pick the following binary release" ?
>
> Stephan
>
>
>
> > >> On Mon, Aug 18, 2014 at 5:58 PM, Henry Saputra <
> henry.saputra@gmail.com>
> > wrote:
> > >>> As for Flink, for now the additional CDH4 packaged binary is to
> > >>> support "non-standard" Hadoop version that some customers may already
> > >>> have.
> > >>>
> > >>> Based on "not a question of supporting a vendor but a Hadoop version
> > >>> combo.", would the approach that Flink had done to help customers get
> > >>> go and running quickly seemed fair and good idea?
> > >>>
> > >>> There had been a lot of discussion about ASF release artifacts and
> the
> > >>> consistent answer is that ASF validate release of source code and not
> > >>> binaries.
> > >>> Release of binaries only used to help customers, which is the case
> > >>> that Flink is doing with different Hadoop versions.
> > >>>
> > >>> - Henry
> > >>>
> > >>> On Mon, Aug 18, 2014 at 9:51 AM, Sean Owen <sr...@gmail.com> wrote:
> > >>>> It's probably the same thing as with Spark. Spark doesn't actually
> > >>>> work with YARN 'beta'-era releases, but works 'stable' and specially
> > >>>> supports 'alpha'. CDH 4.{2-4} or so == YARN 'beta' (not
> non-standard,
> > >>>> but, is probably the only distro of it you'll still run into in
> > >>>> circulation). (And so it's kind of unhelpful that Spark has build
> > >>>> instructions for CDH 4.2 + YARN.) Yeah, that's the thing you may
> > >>>> handle as a corner case, or not handle and punt to the vendor. But
> > >>>> even that -- if that's the same issue -- it's not a question of
> > >>>> supporting a vendor but a Hadoop version combo.
> > >>>>
> > >>>> On Mon, Aug 18, 2014 at 5:43 PM, Stephan Ewen <se...@apache.org>
> > wrote:
> > >>>>> I think the main problem was that CDH4 is a non standard build. All
> > others
> > >>>>> we tried worked with  hadoop-1.2 and 2.2/2.4 builds.
> > >>>>>
> > >>>>> But I understand your points.
> > >>>>>
> > >>>>> So, instead of creating those packages, we can make a guide "how to
> > pick
> > >>>>> the right distribution", which points you to the hadoop-1.2 and
> > 2.2/2.4
> > >>>>> builds. For some cases, the guide will ask you to
> "compile-your-own".
> > >>>>>
> >
>

Re: Question on providing CDH packages

Posted by Stephan Ewen <se...@apache.org>.
I like Sean's idea very much: Creating the three packages (Hadoop 1.x,
Hadoop 2.x, Hadoop 2.0 with Yarn beta).

Any objections to creating a help site that says "For that vendor with this
version pick the following binary release" ?

Stephan



> >> On Mon, Aug 18, 2014 at 5:58 PM, Henry Saputra <he...@gmail.com>
> wrote:
> >>> As for Flink, for now the additional CDH4 packaged binary is to
> >>> support "non-standard" Hadoop version that some customers may already
> >>> have.
> >>>
> >>> Based on "not a question of supporting a vendor but a Hadoop version
> >>> combo.", would the approach that Flink had done to help customers get
> >>> go and running quickly seemed fair and good idea?
> >>>
> >>> There had been a lot of discussion about ASF release artifacts and the
> >>> consistent answer is that ASF validate release of source code and not
> >>> binaries.
> >>> Release of binaries only used to help customers, which is the case
> >>> that Flink is doing with different Hadoop versions.
> >>>
> >>> - Henry
> >>>
> >>> On Mon, Aug 18, 2014 at 9:51 AM, Sean Owen <sr...@gmail.com> wrote:
> >>>> It's probably the same thing as with Spark. Spark doesn't actually
> >>>> work with YARN 'beta'-era releases, but works 'stable' and specially
> >>>> supports 'alpha'. CDH 4.{2-4} or so == YARN 'beta' (not non-standard,
> >>>> but, is probably the only distro of it you'll still run into in
> >>>> circulation). (And so it's kind of unhelpful that Spark has build
> >>>> instructions for CDH 4.2 + YARN.) Yeah, that's the thing you may
> >>>> handle as a corner case, or not handle and punt to the vendor. But
> >>>> even that -- if that's the same issue -- it's not a question of
> >>>> supporting a vendor but a Hadoop version combo.
> >>>>
> >>>> On Mon, Aug 18, 2014 at 5:43 PM, Stephan Ewen <se...@apache.org>
> wrote:
> >>>>> I think the main problem was that CDH4 is a non standard build. All
> others
> >>>>> we tried worked with  hadoop-1.2 and 2.2/2.4 builds.
> >>>>>
> >>>>> But I understand your points.
> >>>>>
> >>>>> So, instead of creating those packages, we can make a guide "how to
> pick
> >>>>> the right distribution", which points you to the hadoop-1.2 and
> 2.2/2.4
> >>>>> builds. For some cases, the guide will ask you to "compile-your-own".
> >>>>>
>

Re: Question on providing CDH packages

Posted by Henry Saputra <he...@gmail.com>.
As for Flink, for now the additional CDH4 packaged binary is to
support "non-standard" Hadoop version that some customers may already
have.

Based on "not a question of supporting a vendor but a Hadoop version
combo.", would the approach that Flink had done to help customers get
go and running quickly seemed fair and good idea?

There had been a lot of discussion about ASF release artifacts and the
consistent answer is that ASF validate release of source code and not
binaries.
Release of binaries only used to help customers, which is the case
that Flink is doing with different Hadoop versions.

- Henry

On Mon, Aug 18, 2014 at 9:51 AM, Sean Owen <sr...@gmail.com> wrote:
> It's probably the same thing as with Spark. Spark doesn't actually
> work with YARN 'beta'-era releases, but works 'stable' and specially
> supports 'alpha'. CDH 4.{2-4} or so == YARN 'beta' (not non-standard,
> but, is probably the only distro of it you'll still run into in
> circulation). (And so it's kind of unhelpful that Spark has build
> instructions for CDH 4.2 + YARN.) Yeah, that's the thing you may
> handle as a corner case, or not handle and punt to the vendor. But
> even that -- if that's the same issue -- it's not a question of
> supporting a vendor but a Hadoop version combo.
>
> On Mon, Aug 18, 2014 at 5:43 PM, Stephan Ewen <se...@apache.org> wrote:
>> I think the main problem was that CDH4 is a non standard build. All others
>> we tried worked with  hadoop-1.2 and 2.2/2.4 builds.
>>
>> But I understand your points.
>>
>> So, instead of creating those packages, we can make a guide "how to pick
>> the right distribution", which points you to the hadoop-1.2 and 2.2/2.4
>> builds. For some cases, the guide will ask you to "compile-your-own".
>>

Re: Question on providing CDH packages

Posted by Sean Owen <sr...@gmail.com>.
It's probably the same thing as with Spark. Spark doesn't actually
work with YARN 'beta'-era releases, but works 'stable' and specially
supports 'alpha'. CDH 4.{2-4} or so == YARN 'beta' (not non-standard,
but, is probably the only distro of it you'll still run into in
circulation). (And so it's kind of unhelpful that Spark has build
instructions for CDH 4.2 + YARN.) Yeah, that's the thing you may
handle as a corner case, or not handle and punt to the vendor. But
even that -- if that's the same issue -- it's not a question of
supporting a vendor but a Hadoop version combo.

On Mon, Aug 18, 2014 at 5:43 PM, Stephan Ewen <se...@apache.org> wrote:
> I think the main problem was that CDH4 is a non standard build. All others
> we tried worked with  hadoop-1.2 and 2.2/2.4 builds.
>
> But I understand your points.
>
> So, instead of creating those packages, we can make a guide "how to pick
> the right distribution", which points you to the hadoop-1.2 and 2.2/2.4
> builds. For some cases, the guide will ask you to "compile-your-own".
>

Re: Question on providing CDH packages

Posted by Stephan Ewen <se...@apache.org>.
I think the main problem was that CDH4 is a non standard build. All others
we tried worked with  hadoop-1.2 and 2.2/2.4 builds.

But I understand your points.

So, instead of creating those packages, we can make a guide "how to pick
the right distribution", which points you to the hadoop-1.2 and 2.2/2.4
builds. For some cases, the guide will ask you to "compile-your-own".





On Mon, Aug 18, 2014 at 6:30 PM, Sean Owen <sr...@gmail.com> wrote:

> Vendor X may be slightly against having two Flink-for-X distributions --
> their own and another on a site/project they may not control.
>
> Are all these builds really needed? meaning, does a generic Hadoop 2.x
> build not work on some or most of these? I'd hope so. Might keep things
> simpler for everyone. For example, are the "CDH5" and "HDP2.1" builds not
> really just roughly "Hadoop 2.4" builds? If 2.4 needs its own profile so be
> it, but it need not be so specific to a flavor.
>
> How about some simple steps to at least de-emphasize vendor builds? like a
> separate page or pop-down panel?
>
> I can understand wanting to make it as simple as possible to access the
> right build straight away, since these distros don't have Flink yet of
> course.
>
> And hey, we make concessions in OSS to different versions of Java or Linux
> vs Windows all the time. The bright line isn't clear.
>
> Perhaps: take steps to treat this more as a special case, and produce these
> types of builds only where needed? where a non-trivial number of potential
> users will have trouble consuming the project without a tweak, create a
> special release on the side?
>
>
>
>
>
>
>
> On Mon, Aug 18, 2014 at 5:05 PM, Alan Gates <ga...@hortonworks.com> wrote:
>
> > My concern with this is it appears to put Apache in the business of
> > picking the right Hadoop vendors.  What about IBM, Pivotal, etc.?  I get
> > that the actual desire here is to make things easy for users, and that
> the
> > original three packages offered (Hadoop1, CDH4, Hadoop2) will cover 95%
> of
> > users.  I like that.  I just don't know how to do this and avoid the
> > appearance of favoritism.
> >
> > Perhaps the next best step is to ask on incubator-general and see if
> there
> > is an Apache wide policy or if there needs to be one.
> >
> > Alan.
> >
> >   Robert Metzger <rm...@apache.org>
> >  August 18, 2014 at 6:54
> > Hi,
> >
> > I think we all agree that our project benefits from providing
> pre-compiled
> > binaries for different hadoop distributions.
> >
> > I've drafted an extension of the current download page, that I would
> > suggest to use after the release: http://i.imgur.com/MucW2HD.png
> > As you can see, users can directly pick the Flink version they want (its
> > not going to show the CDH4 package there) or they can choose from the
> table
> > with the most popular (in my opinion) vendor distributions.
> > The different links still point to the "hadoop1", "hadoop2" binaries,
> but I
> > don't think this is highlighting any hadoop vendors.
> >
> > What do you think?
> >
> >
> > On Fri, Aug 15, 2014 at 11:45 PM, Henry Saputra <henry.saputra@gmail.com
> >
> > <he...@gmail.com>
> >
> >   Henry Saputra <he...@gmail.com>
> >  August 15, 2014 at 14:45
> > Ah sorry Alan, did not see your reply to Owen.
> >
> > Mea culpa from me.
> >
> > - Henry
> >
> >
> >
> >   Alan Gates <ga...@hortonworks.com>
> >  August 15, 2014 at 14:15
> >  Sorry, apparently this was unclear, as others asked the same question.
> > Flink hasn't had any Apache releases yet.  I was referring to the
> proposed
> > release that Robert sent out,
> > http://people.apache.org/~rmetzger/flink-0.6-incubating-rc7/
> >
> > Alan.
> >
> >
> >   Sean Owen <sr...@gmail.com>
> >  August 15, 2014 at 11:26
> > PS, sorry for being dense, but I don't see vendor packages at
> > http://flink.incubator.apache.org/downloads.html ?
> >
> > Is it this page?
> > http://flink.incubator.apache.org/docs/0.6-SNAPSHOT/building.html
> >
> > That's more benign, just helping people rebuild for certain distros if
> > desired. Can the example be generified to refer to a fictional "ACME
> > Distribution"? But a note here and there about gotchas building for
> > certain versions and combos seems reasonable.
> >
> > I also find this bit in the build script, although vendor-specific, is
> > a small nice convenience for users:
> > https://github.com/apache/incubator-flink/blob/master/pom.xml#L195
> >   Owen O'Malley <om...@apache.org>
> >  August 15, 2014 at 11:01
> > As a mentor, I agree that vendor specific packages aren't appropriate for
> > the Apache site. (Disclosure: I work at Hortonworks.) Working with the
> > vendors to make packages available is great, but they shouldn't be hosted
> > at Apache.
> >
> > .. Owen
> >
> >
> >
> >
> > --
> > Sent with Postbox <http://www.getpostbox.com>
> >
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> > to which it is addressed and may contain information that is
> confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>

Re: Question on providing CDH packages

Posted by Sean Owen <sr...@gmail.com>.
Vendor X may be slightly against having two Flink-for-X distributions --
their own and another on a site/project they may not control.

Are all these builds really needed? meaning, does a generic Hadoop 2.x
build not work on some or most of these? I'd hope so. Might keep things
simpler for everyone. For example, are the "CDH5" and "HDP2.1" builds not
really just roughly "Hadoop 2.4" builds? If 2.4 needs its own profile so be
it, but it need not be so specific to a flavor.

How about some simple steps to at least de-emphasize vendor builds? like a
separate page or pop-down panel?

I can understand wanting to make it as simple as possible to access the
right build straight away, since these distros don't have Flink yet of
course.

And hey, we make concessions in OSS to different versions of Java or Linux
vs Windows all the time. The bright line isn't clear.

Perhaps: take steps to treat this more as a special case, and produce these
types of builds only where needed? where a non-trivial number of potential
users will have trouble consuming the project without a tweak, create a
special release on the side?







On Mon, Aug 18, 2014 at 5:05 PM, Alan Gates <ga...@hortonworks.com> wrote:

> My concern with this is it appears to put Apache in the business of
> picking the right Hadoop vendors.  What about IBM, Pivotal, etc.?  I get
> that the actual desire here is to make things easy for users, and that the
> original three packages offered (Hadoop1, CDH4, Hadoop2) will cover 95% of
> users.  I like that.  I just don't know how to do this and avoid the
> appearance of favoritism.
>
> Perhaps the next best step is to ask on incubator-general and see if there
> is an Apache wide policy or if there needs to be one.
>
> Alan.
>
>   Robert Metzger <rm...@apache.org>
>  August 18, 2014 at 6:54
> Hi,
>
> I think we all agree that our project benefits from providing pre-compiled
> binaries for different hadoop distributions.
>
> I've drafted an extension of the current download page, that I would
> suggest to use after the release: http://i.imgur.com/MucW2HD.png
> As you can see, users can directly pick the Flink version they want (its
> not going to show the CDH4 package there) or they can choose from the table
> with the most popular (in my opinion) vendor distributions.
> The different links still point to the "hadoop1", "hadoop2" binaries, but I
> don't think this is highlighting any hadoop vendors.
>
> What do you think?
>
>
> On Fri, Aug 15, 2014 at 11:45 PM, Henry Saputra <he...@gmail.com>
> <he...@gmail.com>
>
>   Henry Saputra <he...@gmail.com>
>  August 15, 2014 at 14:45
> Ah sorry Alan, did not see your reply to Owen.
>
> Mea culpa from me.
>
> - Henry
>
>
>
>   Alan Gates <ga...@hortonworks.com>
>  August 15, 2014 at 14:15
>  Sorry, apparently this was unclear, as others asked the same question.
> Flink hasn't had any Apache releases yet.  I was referring to the proposed
> release that Robert sent out,
> http://people.apache.org/~rmetzger/flink-0.6-incubating-rc7/
>
> Alan.
>
>
>   Sean Owen <sr...@gmail.com>
>  August 15, 2014 at 11:26
> PS, sorry for being dense, but I don't see vendor packages at
> http://flink.incubator.apache.org/downloads.html ?
>
> Is it this page?
> http://flink.incubator.apache.org/docs/0.6-SNAPSHOT/building.html
>
> That's more benign, just helping people rebuild for certain distros if
> desired. Can the example be generified to refer to a fictional "ACME
> Distribution"? But a note here and there about gotchas building for
> certain versions and combos seems reasonable.
>
> I also find this bit in the build script, although vendor-specific, is
> a small nice convenience for users:
> https://github.com/apache/incubator-flink/blob/master/pom.xml#L195
>   Owen O'Malley <om...@apache.org>
>  August 15, 2014 at 11:01
> As a mentor, I agree that vendor specific packages aren't appropriate for
> the Apache site. (Disclosure: I work at Hortonworks.) Working with the
> vendors to make packages available is great, but they shouldn't be hosted
> at Apache.
>
> .. Owen
>
>
>
>
> --
> Sent with Postbox <http://www.getpostbox.com>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Question on providing CDH packages

Posted by Alan Gates <ga...@hortonworks.com>.
My concern with this is it appears to put Apache in the business of 
picking the right Hadoop vendors.  What about IBM, Pivotal, etc.?  I get 
that the actual desire here is to make things easy for users, and that 
the original three packages offered (Hadoop1, CDH4, Hadoop2) will cover 
95% of users.  I like that.  I just don't know how to do this and avoid 
the appearance of favoritism.

Perhaps the next best step is to ask on incubator-general and see if 
there is an Apache wide policy or if there needs to be one.

Alan.

> Robert Metzger <ma...@apache.org>
> August 18, 2014 at 6:54
> Hi,
>
> I think we all agree that our project benefits from providing pre-compiled
> binaries for different hadoop distributions.
>
> I've drafted an extension of the current download page, that I would
> suggest to use after the release: http://i.imgur.com/MucW2HD.png
> As you can see, users can directly pick the Flink version they want (its
> not going to show the CDH4 package there) or they can choose from the 
> table
> with the most popular (in my opinion) vendor distributions.
> The different links still point to the "hadoop1", "hadoop2" binaries, 
> but I
> don't think this is highlighting any hadoop vendors.
>
> What do you think?
>
>
> On Fri, Aug 15, 2014 at 11:45 PM, Henry Saputra <he...@gmail.com>
>
> Henry Saputra <ma...@gmail.com>
> August 15, 2014 at 14:45
> Ah sorry Alan, did not see your reply to Owen.
>
> Mea culpa from me.
>
> - Henry
>
>
>
> Alan Gates <ma...@hortonworks.com>
> August 15, 2014 at 14:15
> Sorry, apparently this was unclear, as others asked the same 
> question.  Flink hasn't had any Apache releases yet.  I was referring 
> to the proposed release that Robert sent out, 
> http://people.apache.org/~rmetzger/flink-0.6-incubating-rc7/
>
> Alan.
>
>
> Sean Owen <ma...@gmail.com>
> August 15, 2014 at 11:26
> PS, sorry for being dense, but I don't see vendor packages at
> http://flink.incubator.apache.org/downloads.html ?
>
> Is it this page?
> http://flink.incubator.apache.org/docs/0.6-SNAPSHOT/building.html
>
> That's more benign, just helping people rebuild for certain distros if
> desired. Can the example be generified to refer to a fictional "ACME
> Distribution"? But a note here and there about gotchas building for
> certain versions and combos seems reasonable.
>
> I also find this bit in the build script, although vendor-specific, is
> a small nice convenience for users:
> https://github.com/apache/incubator-flink/blob/master/pom.xml#L195
> Owen O'Malley <ma...@apache.org>
> August 15, 2014 at 11:01
> As a mentor, I agree that vendor specific packages aren't appropriate for
> the Apache site. (Disclosure: I work at Hortonworks.) Working with the
> vendors to make packages available is great, but they shouldn't be hosted
> at Apache.
>
> .. Owen
>
>
>

-- 
Sent with Postbox <http://www.getpostbox.com>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Question on providing CDH packages

Posted by Stephan Ewen <se...@apache.org>.
The approach seems fair in the way it presents all vendors equally and
still offers user a convenient way to get started.

I personally like it, but I cannot say in how far this is compliant with
Apache policies.

Re: Question on providing CDH packages

Posted by Robert Metzger <rm...@apache.org>.
Hi,

I think we all agree that our project benefits from providing pre-compiled
binaries for different hadoop distributions.

I've drafted an extension of the current download page, that I would
suggest to use after the release: http://i.imgur.com/MucW2HD.png
As you can see, users can directly pick the Flink version they want (its
not going to show the CDH4 package there) or they can choose from the table
with the most popular (in my opinion) vendor distributions.
The different links still point to the "hadoop1", "hadoop2" binaries, but I
don't think this is highlighting any hadoop vendors.

What do you think?


On Fri, Aug 15, 2014 at 11:45 PM, Henry Saputra <he...@gmail.com>
wrote:

> Ah sorry Alan, did not see your reply to Owen.
>
> Mea culpa from me.
>
> - Henry
>
>
> On Fri, Aug 15, 2014 at 2:15 PM, Alan Gates <ga...@hortonworks.com> wrote:
>
> > Sorry, apparently this was unclear, as others asked the same question.
> > Flink hasn't had any Apache releases yet.  I was referring to the
> proposed
> > release that Robert sent out,
> > http://people.apache.org/~rmetzger/flink-0.6-incubating-rc7/
> >
> > Alan.
> >
> >   Sean Owen <sr...@gmail.com>
> >  August 15, 2014 at 11:26 AM
> > PS, sorry for being dense, but I don't see vendor packages at
> > http://flink.incubator.apache.org/downloads.html ?
> >
> > Is it this page?
> > http://flink.incubator.apache.org/docs/0.6-SNAPSHOT/building.html
> >
> > That's more benign, just helping people rebuild for certain distros if
> > desired. Can the example be generified to refer to a fictional "ACME
> > Distribution"? But a note here and there about gotchas building for
> > certain versions and combos seems reasonable.
> >
> > I also find this bit in the build script, although vendor-specific, is
> > a small nice convenience for users:
> > https://github.com/apache/incubator-flink/blob/master/pom.xml#L195
> >   Owen O'Malley <om...@apache.org>
> >  August 15, 2014 at 11:01 AM
> > As a mentor, I agree that vendor specific packages aren't appropriate for
> > the Apache site. (Disclosure: I work at Hortonworks.) Working with the
> > vendors to make packages available is great, but they shouldn't be hosted
> > at Apache.
> >
> > .. Owen
> >
> >
> >
> >   Sean Owen <sr...@gmail.com>
> >  August 15, 2014 at 10:32 AM
> > I hope not surprisingly, I agree. (Backstory: I am at Cloudera.) I
> > have for example lobbied Spark to remove CDH-specific releases and
> > build profiles. Not just for this reason, but because it is often
> > unnecessary to have vendor-specific builds, and also just increases
> > maintenance overhead for the project.
> >
> > Matei et al say they want to make it as easy as possible to consume
> > Spark, and so provide vendor-build-specific artifacts and such here
> > and there. To be fair, Spark tries to support a large range of Hadoop
> > and YARN versions, and getting the right combination of profiles and
> > versions right to recreate a vendor release was kind of hard until
> > about Hadoop 2.2 (stable YARN really).
> >
> > I haven't heard of any formal policy. I would ask whether there are
> > similar reasons to produce pre-packaged releases like so?
> >
> >   Alan Gates <ga...@hortonworks.com>
> >  August 15, 2014 at 10:24 AM
> >  Let me begin by noting that I obviously have a conflict of interest
> since
> > my company is a direct competitor to Cloudera.  But as a mentor and
> Apache
> > member I believe I need to bring this up.
> >
> > What is the Apache policy towards having a vendor specific package on a
> > download site?  It is strange to me to come to Flink's website and see
> > packages for Flink with CDH (or HDP or MapR or whatever).  We should
> avoid
> > providing vendor specific packages.  It gives the appearance of
> preferring
> > one vendor over another, which Apache does not want to do.
> >
> > I have no problem at all with Cloudera hosting a CDH specific package of
> > Flink, nor with Flink project members working with Cloudera to create
> such
> > a package.  But I do not think they should be hosted at Apache.
> >
> > Alan.
> >
> >
> > --
> > Sent with Postbox <http://www.getpostbox.com>
> >
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> > to which it is addressed and may contain information that is
> confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>

Re: Question on providing CDH packages

Posted by Henry Saputra <he...@gmail.com>.
Ah sorry Alan, did not see your reply to Owen.

Mea culpa from me.

- Henry


On Fri, Aug 15, 2014 at 2:15 PM, Alan Gates <ga...@hortonworks.com> wrote:

> Sorry, apparently this was unclear, as others asked the same question.
> Flink hasn't had any Apache releases yet.  I was referring to the proposed
> release that Robert sent out,
> http://people.apache.org/~rmetzger/flink-0.6-incubating-rc7/
>
> Alan.
>
>   Sean Owen <sr...@gmail.com>
>  August 15, 2014 at 11:26 AM
> PS, sorry for being dense, but I don't see vendor packages at
> http://flink.incubator.apache.org/downloads.html ?
>
> Is it this page?
> http://flink.incubator.apache.org/docs/0.6-SNAPSHOT/building.html
>
> That's more benign, just helping people rebuild for certain distros if
> desired. Can the example be generified to refer to a fictional "ACME
> Distribution"? But a note here and there about gotchas building for
> certain versions and combos seems reasonable.
>
> I also find this bit in the build script, although vendor-specific, is
> a small nice convenience for users:
> https://github.com/apache/incubator-flink/blob/master/pom.xml#L195
>   Owen O'Malley <om...@apache.org>
>  August 15, 2014 at 11:01 AM
> As a mentor, I agree that vendor specific packages aren't appropriate for
> the Apache site. (Disclosure: I work at Hortonworks.) Working with the
> vendors to make packages available is great, but they shouldn't be hosted
> at Apache.
>
> .. Owen
>
>
>
>   Sean Owen <sr...@gmail.com>
>  August 15, 2014 at 10:32 AM
> I hope not surprisingly, I agree. (Backstory: I am at Cloudera.) I
> have for example lobbied Spark to remove CDH-specific releases and
> build profiles. Not just for this reason, but because it is often
> unnecessary to have vendor-specific builds, and also just increases
> maintenance overhead for the project.
>
> Matei et al say they want to make it as easy as possible to consume
> Spark, and so provide vendor-build-specific artifacts and such here
> and there. To be fair, Spark tries to support a large range of Hadoop
> and YARN versions, and getting the right combination of profiles and
> versions right to recreate a vendor release was kind of hard until
> about Hadoop 2.2 (stable YARN really).
>
> I haven't heard of any formal policy. I would ask whether there are
> similar reasons to produce pre-packaged releases like so?
>
>   Alan Gates <ga...@hortonworks.com>
>  August 15, 2014 at 10:24 AM
>  Let me begin by noting that I obviously have a conflict of interest since
> my company is a direct competitor to Cloudera.  But as a mentor and Apache
> member I believe I need to bring this up.
>
> What is the Apache policy towards having a vendor specific package on a
> download site?  It is strange to me to come to Flink's website and see
> packages for Flink with CDH (or HDP or MapR or whatever).  We should avoid
> providing vendor specific packages.  It gives the appearance of preferring
> one vendor over another, which Apache does not want to do.
>
> I have no problem at all with Cloudera hosting a CDH specific package of
> Flink, nor with Flink project members working with Cloudera to create such
> a package.  But I do not think they should be hosted at Apache.
>
> Alan.
>
>
> --
> Sent with Postbox <http://www.getpostbox.com>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Question on providing CDH packages

Posted by Alan Gates <ga...@hortonworks.com>.
Sorry, apparently this was unclear, as others asked the same question.  
Flink hasn't had any Apache releases yet.  I was referring to the 
proposed release that Robert sent out, 
http://people.apache.org/~rmetzger/flink-0.6-incubating-rc7/

Alan.

> Sean Owen <ma...@gmail.com>
> August 15, 2014 at 11:26 AM
> PS, sorry for being dense, but I don't see vendor packages at
> http://flink.incubator.apache.org/downloads.html ?
>
> Is it this page?
> http://flink.incubator.apache.org/docs/0.6-SNAPSHOT/building.html
>
> That's more benign, just helping people rebuild for certain distros if
> desired. Can the example be generified to refer to a fictional "ACME
> Distribution"? But a note here and there about gotchas building for
> certain versions and combos seems reasonable.
>
> I also find this bit in the build script, although vendor-specific, is
> a small nice convenience for users:
> https://github.com/apache/incubator-flink/blob/master/pom.xml#L195
> Owen O'Malley <ma...@apache.org>
> August 15, 2014 at 11:01 AM
> As a mentor, I agree that vendor specific packages aren't appropriate for
> the Apache site. (Disclosure: I work at Hortonworks.) Working with the
> vendors to make packages available is great, but they shouldn't be hosted
> at Apache.
>
> .. Owen
>
>
>
> Sean Owen <ma...@gmail.com>
> August 15, 2014 at 10:32 AM
> I hope not surprisingly, I agree. (Backstory: I am at Cloudera.) I
> have for example lobbied Spark to remove CDH-specific releases and
> build profiles. Not just for this reason, but because it is often
> unnecessary to have vendor-specific builds, and also just increases
> maintenance overhead for the project.
>
> Matei et al say they want to make it as easy as possible to consume
> Spark, and so provide vendor-build-specific artifacts and such here
> and there. To be fair, Spark tries to support a large range of Hadoop
> and YARN versions, and getting the right combination of profiles and
> versions right to recreate a vendor release was kind of hard until
> about Hadoop 2.2 (stable YARN really).
>
> I haven't heard of any formal policy. I would ask whether there are
> similar reasons to produce pre-packaged releases like so?
>
> Alan Gates <ma...@hortonworks.com>
> August 15, 2014 at 10:24 AM
> Let me begin by noting that I obviously have a conflict of interest 
> since my company is a direct competitor to Cloudera.  But as a mentor 
> and Apache member I believe I need to bring this up.
>
> What is the Apache policy towards having a vendor specific package on 
> a download site?  It is strange to me to come to Flink's website and 
> see packages for Flink with CDH (or HDP or MapR or whatever).  We 
> should avoid providing vendor specific packages.  It gives the 
> appearance of preferring one vendor over another, which Apache does 
> not want to do.
>
> I have no problem at all with Cloudera hosting a CDH specific package 
> of Flink, nor with Flink project members working with Cloudera to 
> create such a package.  But I do not think they should be hosted at 
> Apache.
>
> Alan.

-- 
Sent with Postbox <http://www.getpostbox.com>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Question on providing CDH packages

Posted by Henry Saputra <he...@gmail.com>.
Hi Sean, I don't think Flink has done with a release yet.

We are trying to do several RCs to get the one that good enough to ve voted on.

- Henry

On Fri, Aug 15, 2014 at 11:26 AM, Sean Owen <sr...@gmail.com> wrote:
> PS, sorry for being dense, but I don't see vendor packages at
> http://flink.incubator.apache.org/downloads.html ?
>
> Is it this page?
> http://flink.incubator.apache.org/docs/0.6-SNAPSHOT/building.html
>
> That's more benign, just helping people rebuild for certain distros if
> desired. Can the example be generified to refer to a fictional "ACME
> Distribution"? But a note here and there about gotchas building for
> certain versions and combos seems reasonable.
>
> I also find this bit in the build script, although vendor-specific, is
> a small nice convenience for users:
> https://github.com/apache/incubator-flink/blob/master/pom.xml#L195
>
> On Fri, Aug 15, 2014 at 7:01 PM, Owen O'Malley <om...@apache.org> wrote:
>> As a mentor, I agree that vendor specific packages aren't appropriate for
>> the Apache site. (Disclosure: I work at Hortonworks.) Working with the
>> vendors to make packages available is great, but they shouldn't be hosted
>> at Apache.
>>
>> .. Owen
>>
>>
>> On Fri, Aug 15, 2014 at 10:32 AM, Sean Owen <sr...@gmail.com> wrote:
>>
>>> I hope not surprisingly, I agree. (Backstory: I am at Cloudera.) I
>>> have for example lobbied Spark to remove CDH-specific releases and
>>> build profiles. Not just for this reason, but because it is often
>>> unnecessary to have vendor-specific builds, and also just increases
>>> maintenance overhead for the project.
>>>
>>> Matei et al say they want to make it as easy as possible to consume
>>> Spark, and so provide vendor-build-specific artifacts and such here
>>> and there. To be fair, Spark tries to support a large range of Hadoop
>>> and YARN versions, and getting the right combination of profiles and
>>> versions right to recreate a vendor release was kind of hard until
>>> about Hadoop 2.2 (stable YARN really).
>>>
>>> I haven't heard of any formal policy. I would ask whether there are
>>> similar reasons to produce pre-packaged releases like so?
>>>
>>>
>>> On Fri, Aug 15, 2014 at 6:24 PM, Alan Gates <ga...@hortonworks.com> wrote:
>>> > Let me begin by noting that I obviously have a conflict of interest
>>> since my
>>> > company is a direct competitor to Cloudera.  But as a mentor and Apache
>>> > member I believe I need to bring this up.
>>> >
>>> > What is the Apache policy towards having a vendor specific package on a
>>> > download site?  It is strange to me to come to Flink's website and see
>>> > packages for Flink with CDH (or HDP or MapR or whatever).  We should
>>> avoid
>>> > providing vendor specific packages.  It gives the appearance of
>>> preferring
>>> > one vendor over another, which Apache does not want to do.
>>> >
>>> > I have no problem at all with Cloudera hosting a CDH specific package of
>>> > Flink, nor with Flink project members working with Cloudera to create
>>> such a
>>> > package.  But I do not think they should be hosted at Apache.
>>> >
>>> > Alan.
>>> > --
>>> > Sent with Postbox <http://www.getpostbox.com>
>>> >
>>> > --
>>> > CONFIDENTIALITY NOTICE
>>> > NOTICE: This message is intended for the use of the individual or entity
>>> to
>>> > which it is addressed and may contain information that is confidential,
>>> > privileged and exempt from disclosure under applicable law. If the
>>> reader of
>>> > this message is not the intended recipient, you are hereby notified that
>>> any
>>> > printing, copying, dissemination, distribution, disclosure or forwarding
>>> of
>>> > this communication is strictly prohibited. If you have received this
>>> > communication in error, please contact the sender immediately and delete
>>> it
>>> > from your system. Thank You.
>>>

Re: Question on providing CDH packages

Posted by Sean Owen <sr...@gmail.com>.
PS, sorry for being dense, but I don't see vendor packages at
http://flink.incubator.apache.org/downloads.html ?

Is it this page?
http://flink.incubator.apache.org/docs/0.6-SNAPSHOT/building.html

That's more benign, just helping people rebuild for certain distros if
desired. Can the example be generified to refer to a fictional "ACME
Distribution"? But a note here and there about gotchas building for
certain versions and combos seems reasonable.

I also find this bit in the build script, although vendor-specific, is
a small nice convenience for users:
https://github.com/apache/incubator-flink/blob/master/pom.xml#L195

On Fri, Aug 15, 2014 at 7:01 PM, Owen O'Malley <om...@apache.org> wrote:
> As a mentor, I agree that vendor specific packages aren't appropriate for
> the Apache site. (Disclosure: I work at Hortonworks.) Working with the
> vendors to make packages available is great, but they shouldn't be hosted
> at Apache.
>
> .. Owen
>
>
> On Fri, Aug 15, 2014 at 10:32 AM, Sean Owen <sr...@gmail.com> wrote:
>
>> I hope not surprisingly, I agree. (Backstory: I am at Cloudera.) I
>> have for example lobbied Spark to remove CDH-specific releases and
>> build profiles. Not just for this reason, but because it is often
>> unnecessary to have vendor-specific builds, and also just increases
>> maintenance overhead for the project.
>>
>> Matei et al say they want to make it as easy as possible to consume
>> Spark, and so provide vendor-build-specific artifacts and such here
>> and there. To be fair, Spark tries to support a large range of Hadoop
>> and YARN versions, and getting the right combination of profiles and
>> versions right to recreate a vendor release was kind of hard until
>> about Hadoop 2.2 (stable YARN really).
>>
>> I haven't heard of any formal policy. I would ask whether there are
>> similar reasons to produce pre-packaged releases like so?
>>
>>
>> On Fri, Aug 15, 2014 at 6:24 PM, Alan Gates <ga...@hortonworks.com> wrote:
>> > Let me begin by noting that I obviously have a conflict of interest
>> since my
>> > company is a direct competitor to Cloudera.  But as a mentor and Apache
>> > member I believe I need to bring this up.
>> >
>> > What is the Apache policy towards having a vendor specific package on a
>> > download site?  It is strange to me to come to Flink's website and see
>> > packages for Flink with CDH (or HDP or MapR or whatever).  We should
>> avoid
>> > providing vendor specific packages.  It gives the appearance of
>> preferring
>> > one vendor over another, which Apache does not want to do.
>> >
>> > I have no problem at all with Cloudera hosting a CDH specific package of
>> > Flink, nor with Flink project members working with Cloudera to create
>> such a
>> > package.  But I do not think they should be hosted at Apache.
>> >
>> > Alan.
>> > --
>> > Sent with Postbox <http://www.getpostbox.com>
>> >
>> > --
>> > CONFIDENTIALITY NOTICE
>> > NOTICE: This message is intended for the use of the individual or entity
>> to
>> > which it is addressed and may contain information that is confidential,
>> > privileged and exempt from disclosure under applicable law. If the
>> reader of
>> > this message is not the intended recipient, you are hereby notified that
>> any
>> > printing, copying, dissemination, distribution, disclosure or forwarding
>> of
>> > this communication is strictly prohibited. If you have received this
>> > communication in error, please contact the sender immediately and delete
>> it
>> > from your system. Thank You.
>>

Re: Question on providing CDH packages

Posted by Owen O'Malley <om...@apache.org>.
As a mentor, I agree that vendor specific packages aren't appropriate for
the Apache site. (Disclosure: I work at Hortonworks.) Working with the
vendors to make packages available is great, but they shouldn't be hosted
at Apache.

.. Owen


On Fri, Aug 15, 2014 at 10:32 AM, Sean Owen <sr...@gmail.com> wrote:

> I hope not surprisingly, I agree. (Backstory: I am at Cloudera.) I
> have for example lobbied Spark to remove CDH-specific releases and
> build profiles. Not just for this reason, but because it is often
> unnecessary to have vendor-specific builds, and also just increases
> maintenance overhead for the project.
>
> Matei et al say they want to make it as easy as possible to consume
> Spark, and so provide vendor-build-specific artifacts and such here
> and there. To be fair, Spark tries to support a large range of Hadoop
> and YARN versions, and getting the right combination of profiles and
> versions right to recreate a vendor release was kind of hard until
> about Hadoop 2.2 (stable YARN really).
>
> I haven't heard of any formal policy. I would ask whether there are
> similar reasons to produce pre-packaged releases like so?
>
>
> On Fri, Aug 15, 2014 at 6:24 PM, Alan Gates <ga...@hortonworks.com> wrote:
> > Let me begin by noting that I obviously have a conflict of interest
> since my
> > company is a direct competitor to Cloudera.  But as a mentor and Apache
> > member I believe I need to bring this up.
> >
> > What is the Apache policy towards having a vendor specific package on a
> > download site?  It is strange to me to come to Flink's website and see
> > packages for Flink with CDH (or HDP or MapR or whatever).  We should
> avoid
> > providing vendor specific packages.  It gives the appearance of
> preferring
> > one vendor over another, which Apache does not want to do.
> >
> > I have no problem at all with Cloudera hosting a CDH specific package of
> > Flink, nor with Flink project members working with Cloudera to create
> such a
> > package.  But I do not think they should be hosted at Apache.
> >
> > Alan.
> > --
> > Sent with Postbox <http://www.getpostbox.com>
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the
> reader of
> > this message is not the intended recipient, you are hereby notified that
> any
> > printing, copying, dissemination, distribution, disclosure or forwarding
> of
> > this communication is strictly prohibited. If you have received this
> > communication in error, please contact the sender immediately and delete
> it
> > from your system. Thank You.
>

Re: Question on providing CDH packages

Posted by Sean Owen <sr...@gmail.com>.
I hope not surprisingly, I agree. (Backstory: I am at Cloudera.) I
have for example lobbied Spark to remove CDH-specific releases and
build profiles. Not just for this reason, but because it is often
unnecessary to have vendor-specific builds, and also just increases
maintenance overhead for the project.

Matei et al say they want to make it as easy as possible to consume
Spark, and so provide vendor-build-specific artifacts and such here
and there. To be fair, Spark tries to support a large range of Hadoop
and YARN versions, and getting the right combination of profiles and
versions right to recreate a vendor release was kind of hard until
about Hadoop 2.2 (stable YARN really).

I haven't heard of any formal policy. I would ask whether there are
similar reasons to produce pre-packaged releases like so?


On Fri, Aug 15, 2014 at 6:24 PM, Alan Gates <ga...@hortonworks.com> wrote:
> Let me begin by noting that I obviously have a conflict of interest since my
> company is a direct competitor to Cloudera.  But as a mentor and Apache
> member I believe I need to bring this up.
>
> What is the Apache policy towards having a vendor specific package on a
> download site?  It is strange to me to come to Flink's website and see
> packages for Flink with CDH (or HDP or MapR or whatever).  We should avoid
> providing vendor specific packages.  It gives the appearance of preferring
> one vendor over another, which Apache does not want to do.
>
> I have no problem at all with Cloudera hosting a CDH specific package of
> Flink, nor with Flink project members working with Cloudera to create such a
> package.  But I do not think they should be hosted at Apache.
>
> Alan.
> --
> Sent with Postbox <http://www.getpostbox.com>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader of
> this message is not the intended recipient, you are hereby notified that any
> printing, copying, dissemination, distribution, disclosure or forwarding of
> this communication is strictly prohibited. If you have received this
> communication in error, please contact the sender immediately and delete it
> from your system. Thank You.