You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Robert Metzger <rm...@apache.org> on 2016/10/13 14:47:36 UTC

[DISCUSS] Drop Hadoop 1 support with Flink 1.2

Hi,

The Apache Hadoop community has recently released the first alpha version
for Hadoop 3.0.0, while we are still supporting Hadoop 1. I think its time
to finally drop Hadoop 1 support in Flink.

The last minor Hadoop 1 release was in 27 June, 2014.
Apache Spark dropped Hadoop 1 support with their 2.0 release in July 2016.
Hadoop 2.2 was first released in October 2013, so there was enough time for
users to upgrade.

I added also the user@ list to the discussion to get opinions about this
from there as well.

Let me know what you think about this!


Regards,
Robert

Re: [DISCUSS] Drop Hadoop 1 support with Flink 1.2

Posted by Neelesh Salian <ns...@cloudera.com>.

+1 to dropping Hadoop 1.x
I am fairly certain there are very few legacy Hadoop users. 2.x is heavily
used at the moment.
Spark actually changed not just Hadoop but Python versions as well.

Hadoop 3 would take a while to mature so I would suggest holding off on
that after it is well baked in and used.

@Robert , after sufficient folks have responded to this thread and
discussed, if you could kick off a vote (if needed), then we can go ahead
and get that confirmed.
Thanks for bringing it up.


On Thu, Oct 13, 2016 at 8:02 AM, <ru...@accenture.com> wrote:

> I am totally agree with Robert. From the industry point of view, we are
> not using in any client Hadoop 1.x . Even in legacy system, we have already
> upgraded the software.
>
>
>
> *From:* Robert Metzger [mailto:rmetzger@apache.org]
> *Sent:* jueves, 13 de octubre de 2016 16:48
> *To:* dev@flink.apache.org; user@flink.apache.org
> *Subject:* [DISCUSS] Drop Hadoop 1 support with Flink 1.2
>
>
>
> Hi,
>
>
>
> The Apache Hadoop community has recently released the first alpha version
> for Hadoop 3.0.0, while we are still supporting Hadoop 1. I think its time
> to finally drop Hadoop 1 support in Flink.
>
>
>
> The last minor Hadoop 1 release was in 27 June, 2014.
>
> Apache Spark dropped Hadoop 1 support with their 2.0 release in July 2016.
>
> Hadoop 2.2 was first released in October 2013, so there was enough time
> for users to upgrade.
>
>
>
> I added also the user@ list to the discussion to get opinions about this
> from there as well.
>
>
>
> Let me know what you think about this!
>
>
>
>
>
> Regards,
>
> Robert
>
> ------------------------------
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
> ____________________________________________________________
> __________________________
>
> www.accenture.com
>



-- 
Neelesh Srinivas Salian
Engineer

Re: [DISCUSS] Drop Hadoop 1 support with Flink 1.2

Posted by Neelesh Salian <ns...@cloudera.com>.

+1 to dropping Hadoop 1.x
I am fairly certain there are very few legacy Hadoop users. 2.x is heavily
used at the moment.
Spark actually changed not just Hadoop but Python versions as well.

Hadoop 3 would take a while to mature so I would suggest holding off on
that after it is well baked in and used.

@Robert , after sufficient folks have responded to this thread and
discussed, if you could kick off a vote (if needed), then we can go ahead
and get that confirmed.
Thanks for bringing it up.


On Thu, Oct 13, 2016 at 8:02 AM, <ru...@accenture.com> wrote:

> I am totally agree with Robert. From the industry point of view, we are
> not using in any client Hadoop 1.x . Even in legacy system, we have already
> upgraded the software.
>
>
>
> *From:* Robert Metzger [mailto:rmetzger@apache.org]
> *Sent:* jueves, 13 de octubre de 2016 16:48
> *To:* dev@flink.apache.org; user@flink.apache.org
> *Subject:* [DISCUSS] Drop Hadoop 1 support with Flink 1.2
>
>
>
> Hi,
>
>
>
> The Apache Hadoop community has recently released the first alpha version
> for Hadoop 3.0.0, while we are still supporting Hadoop 1. I think its time
> to finally drop Hadoop 1 support in Flink.
>
>
>
> The last minor Hadoop 1 release was in 27 June, 2014.
>
> Apache Spark dropped Hadoop 1 support with their 2.0 release in July 2016.
>
> Hadoop 2.2 was first released in October 2013, so there was enough time
> for users to upgrade.
>
>
>
> I added also the user@ list to the discussion to get opinions about this
> from there as well.
>
>
>
> Let me know what you think about this!
>
>
>
>
>
> Regards,
>
> Robert
>
> ------------------------------
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
> ____________________________________________________________
> __________________________
>
> www.accenture.com
>



-- 
Neelesh Srinivas Salian
Engineer

RE: [DISCUSS] Drop Hadoop 1 support with Flink 1.2

Posted by ru...@accenture.com.

I am totally agree with Robert. From the industry point of view, we are not using in any client Hadoop 1.x . Even in legacy system, we have already upgraded the software.

From: Robert Metzger [mailto:rmetzger@apache.org]
Sent: jueves, 13 de octubre de 2016 16:48
To: dev@flink.apache.org; user@flink.apache.org
Subject: [DISCUSS] Drop Hadoop 1 support with Flink 1.2

Hi,

The Apache Hadoop community has recently released the first alpha version for Hadoop 3.0.0, while we are still supporting Hadoop 1. I think its time to finally drop Hadoop 1 support in Flink.

The last minor Hadoop 1 release was in 27 June, 2014.
Apache Spark dropped Hadoop 1 support with their 2.0 release in July 2016.
Hadoop 2.2 was first released in October 2013, so there was enough time for users to upgrade.

I added also the user@ list to the discussion to get opinions about this from there as well.

Let me know what you think about this!

Regards,
Robert

________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________

www.accenture.com

RE: [DISCUSS] Drop Hadoop 1 support with Flink 1.2

Posted by ru...@accenture.com.

I am totally agree with Robert. From the industry point of view, we are not using in any client Hadoop 1.x . Even in legacy system, we have already upgraded the software.

From: Robert Metzger [mailto:rmetzger@apache.org]
Sent: jueves, 13 de octubre de 2016 16:48
To: dev@flink.apache.org; user@flink.apache.org
Subject: [DISCUSS] Drop Hadoop 1 support with Flink 1.2

Hi,

The Apache Hadoop community has recently released the first alpha version for Hadoop 3.0.0, while we are still supporting Hadoop 1. I think its time to finally drop Hadoop 1 support in Flink.

The last minor Hadoop 1 release was in 27 June, 2014.
Apache Spark dropped Hadoop 1 support with their 2.0 release in July 2016.
Hadoop 2.2 was first released in October 2013, so there was enough time for users to upgrade.

I added also the user@ list to the discussion to get opinions about this from there as well.

Let me know what you think about this!

Regards,
Robert

________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________

www.accenture.com

Re: [DISCUSS] Drop Hadoop 1 support with Flink 1.2

Posted by Robert Metzger <rm...@apache.org>.

Thanks alot for the feedback on the issue.

It seems that everybody agrees to drop Hadoop 1 support in Flink. I don't
think we need to vote on the issue.
I've filed a JIRA for the task:
https://issues.apache.org/jira/browse/FLINK-4895 Maybe I'll find time to
work on it next week.


On Fri, Oct 14, 2016 at 11:09 AM, Fabian Hueske <fh...@gmail.com> wrote:

> Thanks for the pointer.
> I'll start a separate discussion and push the PR forward if we come to an
> agreement.
>
> 2016-10-14 11:04 GMT+02:00 Stephan Ewen <se...@apache.org>:
>
> > @Fabian - Someone started with that in
> > https://issues.apache.org/jira/browse/FLINK-4315
> > That could be changed to not remove the methods from the
> > ExecutionEnvironment.
> >
> > On Fri, Oct 14, 2016 at 10:45 AM, Fabian Hueske <fh...@gmail.com>
> wrote:
> >
> > > Yes, I'm also +1 for removing the methods at some point.
> > >
> > > For 1.2 we could go ahead and move the Hadoop-MR connectors into a
> > separate
> > > module and mark the methods in ExecutionEnvironment as @deprecated.
> > > In 1.3 (or 2.0 whatever comes next) we could make the switch.
> > >
> > > 2016-10-14 10:40 GMT+02:00 Stephan Ewen <se...@apache.org>:
> > >
> > > > @Fabian Good point. For Flink 2.0, I would suggest to remove them
> from
> > > the
> > > > Environment and add them to a Utility. The way it is now, it ties
> Flink
> > > > very strongly to Hadoop.
> > > >
> > > > You are right, before we do that, there is no way to make a Hadoop
> > > > independent distribution.
> > > >
> > > > On Fri, Oct 14, 2016 at 10:37 AM, Fabian Hueske <fh...@gmail.com>
> > > wrote:
> > > >
> > > > > +1 for dropping Hadoop1 support.
> > > > >
> > > > > Regarding a binary release without Hadoop:
> > > > >
> > > > > What would we do about the readHadoopFile() and createHadoopInput()
> > on
> > > > the
> > > > > ExecutionEnvironment?
> > > > > These methods are declared as @PublicEvolving, so we did not commit
> > to
> > > > keep
> > > > > them.
> > > > > However that does not necessarily mean we should easily break the
> API
> > > > here
> > > > > esp. since the methods have not been declared @deprecated.
> > > > >
> > > > > Best, Fabian
> > > > >
> > > > >
> > > > >
> > > > > 2016-10-14 10:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
> > > > >
> > > > > > @Greg
> > > > > >
> > > > > > I think that would be amazing. It does require a bit of cleanup,
> > > > though.
> > > > > As
> > > > > > far as I know, the Hadoop dependency is additionally used for
> some
> > > > > Kerberos
> > > > > > utilities and for its S3 file system implementation.
> > > > > > We would need to make the Kerberos part Hadoop independent and
> the
> > > > > > FileSystem loading dynamic (with a good exception that the Hadoop
> > > > > > dependency should be added if the filesystem cannot be loaded).
> > > > > >
> > > > > > Stephan
> > > > > >
> > > > > >
> > > > > > On Thu, Oct 13, 2016 at 8:55 PM, Greg Hogan <co...@greghogan.com>
> > > > wrote:
> > > > > >
> > > > > > > Okay, this sounds prudent. Would this be the right time to
> > > implement
> > > > > > > FLINK-2268 "Provide Flink binary release without Hadoop"?
> > > > > > >
> > > > > > > On Thu, Oct 13, 2016 at 11:25 AM, Stephan Ewen <
> sewen@apache.org
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > +1 for dropping Hadoop1 support
> > > > > > > >
> > > > > > > > @greg There is quite some complexity in the build setup and
> > > release
> > > > > > > scripts
> > > > > > > > and testing to support Hadoop 1. Also, we have to prepare to
> > add
> > > > > > support
> > > > > > > > for Hadoop 3, and then supporting in addition Hadoop 1 seems
> > very
> > > > > > tough.
> > > > > > > >
> > > > > > > > Stephan
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Oct 13, 2016 at 5:04 PM, Greg Hogan <
> > code@greghogan.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Robert,
> > > > > > > > >
> > > > > > > > > What are the benefits to Flink for dropping Hadoop 1
> support?
> > > Is
> > > > > > there
> > > > > > > > > significant code cleanup or would we simply be publishing
> one
> > > > less
> > > > > > set
> > > > > > > of
> > > > > > > > > artifacts?
> > > > > > > > >
> > > > > > > > > Greg
> > > > > > > > >
> > > > > > > > > On Thu, Oct 13, 2016 at 10:47 AM, Robert Metzger <
> > > > > > rmetzger@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > The Apache Hadoop community has recently released the
> first
> > > > alpha
> > > > > > > > version
> > > > > > > > > > for Hadoop 3.0.0, while we are still supporting Hadoop
> 1. I
> > > > think
> > > > > > its
> > > > > > > > > time
> > > > > > > > > > to finally drop Hadoop 1 support in Flink.
> > > > > > > > > >
> > > > > > > > > > The last minor Hadoop 1 release was in 27 June, 2014.
> > > > > > > > > > Apache Spark dropped Hadoop 1 support with their 2.0
> > release
> > > in
> > > > > > July
> > > > > > > > > 2016.
> > > > > > > > > > Hadoop 2.2 was first released in October 2013, so there
> was
> > > > > enough
> > > > > > > time
> > > > > > > > > > for users to upgrade.
> > > > > > > > > >
> > > > > > > > > > I added also the user@ list to the discussion to get
> > > opinions
> > > > > > about
> > > > > > > > this
> > > > > > > > > > from there as well.
> > > > > > > > > >
> > > > > > > > > > Let me know what you think about this!
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Robert
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Drop Hadoop 1 support with Flink 1.2

Posted by Fabian Hueske <fh...@gmail.com>.

Thanks for the pointer.
I'll start a separate discussion and push the PR forward if we come to an
agreement.

2016-10-14 11:04 GMT+02:00 Stephan Ewen <se...@apache.org>:

> @Fabian - Someone started with that in
> https://issues.apache.org/jira/browse/FLINK-4315
> That could be changed to not remove the methods from the
> ExecutionEnvironment.
>
> On Fri, Oct 14, 2016 at 10:45 AM, Fabian Hueske <fh...@gmail.com> wrote:
>
> > Yes, I'm also +1 for removing the methods at some point.
> >
> > For 1.2 we could go ahead and move the Hadoop-MR connectors into a
> separate
> > module and mark the methods in ExecutionEnvironment as @deprecated.
> > In 1.3 (or 2.0 whatever comes next) we could make the switch.
> >
> > 2016-10-14 10:40 GMT+02:00 Stephan Ewen <se...@apache.org>:
> >
> > > @Fabian Good point. For Flink 2.0, I would suggest to remove them from
> > the
> > > Environment and add them to a Utility. The way it is now, it ties Flink
> > > very strongly to Hadoop.
> > >
> > > You are right, before we do that, there is no way to make a Hadoop
> > > independent distribution.
> > >
> > > On Fri, Oct 14, 2016 at 10:37 AM, Fabian Hueske <fh...@gmail.com>
> > wrote:
> > >
> > > > +1 for dropping Hadoop1 support.
> > > >
> > > > Regarding a binary release without Hadoop:
> > > >
> > > > What would we do about the readHadoopFile() and createHadoopInput()
> on
> > > the
> > > > ExecutionEnvironment?
> > > > These methods are declared as @PublicEvolving, so we did not commit
> to
> > > keep
> > > > them.
> > > > However that does not necessarily mean we should easily break the API
> > > here
> > > > esp. since the methods have not been declared @deprecated.
> > > >
> > > > Best, Fabian
> > > >
> > > >
> > > >
> > > > 2016-10-14 10:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
> > > >
> > > > > @Greg
> > > > >
> > > > > I think that would be amazing. It does require a bit of cleanup,
> > > though.
> > > > As
> > > > > far as I know, the Hadoop dependency is additionally used for some
> > > > Kerberos
> > > > > utilities and for its S3 file system implementation.
> > > > > We would need to make the Kerberos part Hadoop independent and the
> > > > > FileSystem loading dynamic (with a good exception that the Hadoop
> > > > > dependency should be added if the filesystem cannot be loaded).
> > > > >
> > > > > Stephan
> > > > >
> > > > >
> > > > > On Thu, Oct 13, 2016 at 8:55 PM, Greg Hogan <co...@greghogan.com>
> > > wrote:
> > > > >
> > > > > > Okay, this sounds prudent. Would this be the right time to
> > implement
> > > > > > FLINK-2268 "Provide Flink binary release without Hadoop"?
> > > > > >
> > > > > > On Thu, Oct 13, 2016 at 11:25 AM, Stephan Ewen <sewen@apache.org
> >
> > > > wrote:
> > > > > >
> > > > > > > +1 for dropping Hadoop1 support
> > > > > > >
> > > > > > > @greg There is quite some complexity in the build setup and
> > release
> > > > > > scripts
> > > > > > > and testing to support Hadoop 1. Also, we have to prepare to
> add
> > > > > support
> > > > > > > for Hadoop 3, and then supporting in addition Hadoop 1 seems
> very
> > > > > tough.
> > > > > > >
> > > > > > > Stephan
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Oct 13, 2016 at 5:04 PM, Greg Hogan <
> code@greghogan.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Robert,
> > > > > > > >
> > > > > > > > What are the benefits to Flink for dropping Hadoop 1 support?
> > Is
> > > > > there
> > > > > > > > significant code cleanup or would we simply be publishing one
> > > less
> > > > > set
> > > > > > of
> > > > > > > > artifacts?
> > > > > > > >
> > > > > > > > Greg
> > > > > > > >
> > > > > > > > On Thu, Oct 13, 2016 at 10:47 AM, Robert Metzger <
> > > > > rmetzger@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > The Apache Hadoop community has recently released the first
> > > alpha
> > > > > > > version
> > > > > > > > > for Hadoop 3.0.0, while we are still supporting Hadoop 1. I
> > > think
> > > > > its
> > > > > > > > time
> > > > > > > > > to finally drop Hadoop 1 support in Flink.
> > > > > > > > >
> > > > > > > > > The last minor Hadoop 1 release was in 27 June, 2014.
> > > > > > > > > Apache Spark dropped Hadoop 1 support with their 2.0
> release
> > in
> > > > > July
> > > > > > > > 2016.
> > > > > > > > > Hadoop 2.2 was first released in October 2013, so there was
> > > > enough
> > > > > > time
> > > > > > > > > for users to upgrade.
> > > > > > > > >
> > > > > > > > > I added also the user@ list to the discussion to get
> > opinions
> > > > > about
> > > > > > > this
> > > > > > > > > from there as well.
> > > > > > > > >
> > > > > > > > > Let me know what you think about this!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Robert
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Drop Hadoop 1 support with Flink 1.2

Posted by Stephan Ewen <se...@apache.org>.

@Fabian - Someone started with that in
https://issues.apache.org/jira/browse/FLINK-4315
That could be changed to not remove the methods from the
ExecutionEnvironment.

On Fri, Oct 14, 2016 at 10:45 AM, Fabian Hueske <fh...@gmail.com> wrote:

> Yes, I'm also +1 for removing the methods at some point.
>
> For 1.2 we could go ahead and move the Hadoop-MR connectors into a separate
> module and mark the methods in ExecutionEnvironment as @deprecated.
> In 1.3 (or 2.0 whatever comes next) we could make the switch.
>
> 2016-10-14 10:40 GMT+02:00 Stephan Ewen <se...@apache.org>:
>
> > @Fabian Good point. For Flink 2.0, I would suggest to remove them from
> the
> > Environment and add them to a Utility. The way it is now, it ties Flink
> > very strongly to Hadoop.
> >
> > You are right, before we do that, there is no way to make a Hadoop
> > independent distribution.
> >
> > On Fri, Oct 14, 2016 at 10:37 AM, Fabian Hueske <fh...@gmail.com>
> wrote:
> >
> > > +1 for dropping Hadoop1 support.
> > >
> > > Regarding a binary release without Hadoop:
> > >
> > > What would we do about the readHadoopFile() and createHadoopInput() on
> > the
> > > ExecutionEnvironment?
> > > These methods are declared as @PublicEvolving, so we did not commit to
> > keep
> > > them.
> > > However that does not necessarily mean we should easily break the API
> > here
> > > esp. since the methods have not been declared @deprecated.
> > >
> > > Best, Fabian
> > >
> > >
> > >
> > > 2016-10-14 10:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
> > >
> > > > @Greg
> > > >
> > > > I think that would be amazing. It does require a bit of cleanup,
> > though.
> > > As
> > > > far as I know, the Hadoop dependency is additionally used for some
> > > Kerberos
> > > > utilities and for its S3 file system implementation.
> > > > We would need to make the Kerberos part Hadoop independent and the
> > > > FileSystem loading dynamic (with a good exception that the Hadoop
> > > > dependency should be added if the filesystem cannot be loaded).
> > > >
> > > > Stephan
> > > >
> > > >
> > > > On Thu, Oct 13, 2016 at 8:55 PM, Greg Hogan <co...@greghogan.com>
> > wrote:
> > > >
> > > > > Okay, this sounds prudent. Would this be the right time to
> implement
> > > > > FLINK-2268 "Provide Flink binary release without Hadoop"?
> > > > >
> > > > > On Thu, Oct 13, 2016 at 11:25 AM, Stephan Ewen <se...@apache.org>
> > > wrote:
> > > > >
> > > > > > +1 for dropping Hadoop1 support
> > > > > >
> > > > > > @greg There is quite some complexity in the build setup and
> release
> > > > > scripts
> > > > > > and testing to support Hadoop 1. Also, we have to prepare to add
> > > > support
> > > > > > for Hadoop 3, and then supporting in addition Hadoop 1 seems very
> > > > tough.
> > > > > >
> > > > > > Stephan
> > > > > >
> > > > > >
> > > > > > On Thu, Oct 13, 2016 at 5:04 PM, Greg Hogan <co...@greghogan.com>
> > > > wrote:
> > > > > >
> > > > > > > Hi Robert,
> > > > > > >
> > > > > > > What are the benefits to Flink for dropping Hadoop 1 support?
> Is
> > > > there
> > > > > > > significant code cleanup or would we simply be publishing one
> > less
> > > > set
> > > > > of
> > > > > > > artifacts?
> > > > > > >
> > > > > > > Greg
> > > > > > >
> > > > > > > On Thu, Oct 13, 2016 at 10:47 AM, Robert Metzger <
> > > > rmetzger@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > The Apache Hadoop community has recently released the first
> > alpha
> > > > > > version
> > > > > > > > for Hadoop 3.0.0, while we are still supporting Hadoop 1. I
> > think
> > > > its
> > > > > > > time
> > > > > > > > to finally drop Hadoop 1 support in Flink.
> > > > > > > >
> > > > > > > > The last minor Hadoop 1 release was in 27 June, 2014.
> > > > > > > > Apache Spark dropped Hadoop 1 support with their 2.0 release
> in
> > > > July
> > > > > > > 2016.
> > > > > > > > Hadoop 2.2 was first released in October 2013, so there was
> > > enough
> > > > > time
> > > > > > > > for users to upgrade.
> > > > > > > >
> > > > > > > > I added also the user@ list to the discussion to get
> opinions
> > > > about
> > > > > > this
> > > > > > > > from there as well.
> > > > > > > >
> > > > > > > > Let me know what you think about this!
> > > > > > > >
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Robert
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Drop Hadoop 1 support with Flink 1.2

Posted by Fabian Hueske <fh...@gmail.com>.

Yes, I'm also +1 for removing the methods at some point.

For 1.2 we could go ahead and move the Hadoop-MR connectors into a separate
module and mark the methods in ExecutionEnvironment as @deprecated.
In 1.3 (or 2.0 whatever comes next) we could make the switch.

2016-10-14 10:40 GMT+02:00 Stephan Ewen <se...@apache.org>:

> @Fabian Good point. For Flink 2.0, I would suggest to remove them from the
> Environment and add them to a Utility. The way it is now, it ties Flink
> very strongly to Hadoop.
>
> You are right, before we do that, there is no way to make a Hadoop
> independent distribution.
>
> On Fri, Oct 14, 2016 at 10:37 AM, Fabian Hueske <fh...@gmail.com> wrote:
>
> > +1 for dropping Hadoop1 support.
> >
> > Regarding a binary release without Hadoop:
> >
> > What would we do about the readHadoopFile() and createHadoopInput() on
> the
> > ExecutionEnvironment?
> > These methods are declared as @PublicEvolving, so we did not commit to
> keep
> > them.
> > However that does not necessarily mean we should easily break the API
> here
> > esp. since the methods have not been declared @deprecated.
> >
> > Best, Fabian
> >
> >
> >
> > 2016-10-14 10:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
> >
> > > @Greg
> > >
> > > I think that would be amazing. It does require a bit of cleanup,
> though.
> > As
> > > far as I know, the Hadoop dependency is additionally used for some
> > Kerberos
> > > utilities and for its S3 file system implementation.
> > > We would need to make the Kerberos part Hadoop independent and the
> > > FileSystem loading dynamic (with a good exception that the Hadoop
> > > dependency should be added if the filesystem cannot be loaded).
> > >
> > > Stephan
> > >
> > >
> > > On Thu, Oct 13, 2016 at 8:55 PM, Greg Hogan <co...@greghogan.com>
> wrote:
> > >
> > > > Okay, this sounds prudent. Would this be the right time to implement
> > > > FLINK-2268 "Provide Flink binary release without Hadoop"?
> > > >
> > > > On Thu, Oct 13, 2016 at 11:25 AM, Stephan Ewen <se...@apache.org>
> > wrote:
> > > >
> > > > > +1 for dropping Hadoop1 support
> > > > >
> > > > > @greg There is quite some complexity in the build setup and release
> > > > scripts
> > > > > and testing to support Hadoop 1. Also, we have to prepare to add
> > > support
> > > > > for Hadoop 3, and then supporting in addition Hadoop 1 seems very
> > > tough.
> > > > >
> > > > > Stephan
> > > > >
> > > > >
> > > > > On Thu, Oct 13, 2016 at 5:04 PM, Greg Hogan <co...@greghogan.com>
> > > wrote:
> > > > >
> > > > > > Hi Robert,
> > > > > >
> > > > > > What are the benefits to Flink for dropping Hadoop 1 support? Is
> > > there
> > > > > > significant code cleanup or would we simply be publishing one
> less
> > > set
> > > > of
> > > > > > artifacts?
> > > > > >
> > > > > > Greg
> > > > > >
> > > > > > On Thu, Oct 13, 2016 at 10:47 AM, Robert Metzger <
> > > rmetzger@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > The Apache Hadoop community has recently released the first
> alpha
> > > > > version
> > > > > > > for Hadoop 3.0.0, while we are still supporting Hadoop 1. I
> think
> > > its
> > > > > > time
> > > > > > > to finally drop Hadoop 1 support in Flink.
> > > > > > >
> > > > > > > The last minor Hadoop 1 release was in 27 June, 2014.
> > > > > > > Apache Spark dropped Hadoop 1 support with their 2.0 release in
> > > July
> > > > > > 2016.
> > > > > > > Hadoop 2.2 was first released in October 2013, so there was
> > enough
> > > > time
> > > > > > > for users to upgrade.
> > > > > > >
> > > > > > > I added also the user@ list to the discussion to get opinions
> > > about
> > > > > this
> > > > > > > from there as well.
> > > > > > >
> > > > > > > Let me know what you think about this!
> > > > > > >
> > > > > > >
> > > > > > > Regards,
> > > > > > > Robert
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Drop Hadoop 1 support with Flink 1.2

Posted by Stephan Ewen <se...@apache.org>.

@Fabian Good point. For Flink 2.0, I would suggest to remove them from the
Environment and add them to a Utility. The way it is now, it ties Flink
very strongly to Hadoop.

You are right, before we do that, there is no way to make a Hadoop
independent distribution.

On Fri, Oct 14, 2016 at 10:37 AM, Fabian Hueske <fh...@gmail.com> wrote:

> +1 for dropping Hadoop1 support.
>
> Regarding a binary release without Hadoop:
>
> What would we do about the readHadoopFile() and createHadoopInput() on the
> ExecutionEnvironment?
> These methods are declared as @PublicEvolving, so we did not commit to keep
> them.
> However that does not necessarily mean we should easily break the API here
> esp. since the methods have not been declared @deprecated.
>
> Best, Fabian
>
>
>
> 2016-10-14 10:29 GMT+02:00 Stephan Ewen <se...@apache.org>:
>
> > @Greg
> >
> > I think that would be amazing. It does require a bit of cleanup, though.
> As
> > far as I know, the Hadoop dependency is additionally used for some
> Kerberos
> > utilities and for its S3 file system implementation.
> > We would need to make the Kerberos part Hadoop independent and the
> > FileSystem loading dynamic (with a good exception that the Hadoop
> > dependency should be added if the filesystem cannot be loaded).
> >
> > Stephan
> >
> >
> > On Thu, Oct 13, 2016 at 8:55 PM, Greg Hogan <co...@greghogan.com> wrote:
> >
> > > Okay, this sounds prudent. Would this be the right time to implement
> > > FLINK-2268 "Provide Flink binary release without Hadoop"?
> > >
> > > On Thu, Oct 13, 2016 at 11:25 AM, Stephan Ewen <se...@apache.org>
> wrote:
> > >
> > > > +1 for dropping Hadoop1 support
> > > >
> > > > @greg There is quite some complexity in the build setup and release
> > > scripts
> > > > and testing to support Hadoop 1. Also, we have to prepare to add
> > support
> > > > for Hadoop 3, and then supporting in addition Hadoop 1 seems very
> > tough.
> > > >
> > > > Stephan
> > > >
> > > >
> > > > On Thu, Oct 13, 2016 at 5:04 PM, Greg Hogan <co...@greghogan.com>
> > wrote:
> > > >
> > > > > Hi Robert,
> > > > >
> > > > > What are the benefits to Flink for dropping Hadoop 1 support? Is
> > there
> > > > > significant code cleanup or would we simply be publishing one less
> > set
> > > of
> > > > > artifacts?
> > > > >
> > > > > Greg
> > > > >
> > > > > On Thu, Oct 13, 2016 at 10:47 AM, Robert Metzger <
> > rmetzger@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > The Apache Hadoop community has recently released the first alpha
> > > > version
> > > > > > for Hadoop 3.0.0, while we are still supporting Hadoop 1. I think
> > its
> > > > > time
> > > > > > to finally drop Hadoop 1 support in Flink.
> > > > > >
> > > > > > The last minor Hadoop 1 release was in 27 June, 2014.
> > > > > > Apache Spark dropped Hadoop 1 support with their 2.0 release in
> > July
> > > > > 2016.
> > > > > > Hadoop 2.2 was first released in October 2013, so there was
> enough
> > > time
> > > > > > for users to upgrade.
> > > > > >
> > > > > > I added also the user@ list to the discussion to get opinions
> > about
> > > > this
> > > > > > from there as well.
> > > > > >
> > > > > > Let me know what you think about this!
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Robert
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Drop Hadoop 1 support with Flink 1.2

Posted by Fabian Hueske <fh...@gmail.com>.

+1 for dropping Hadoop1 support.

Regarding a binary release without Hadoop:

What would we do about the readHadoopFile() and createHadoopInput() on the
ExecutionEnvironment?
These methods are declared as @PublicEvolving, so we did not commit to keep
them.
However that does not necessarily mean we should easily break the API here
esp. since the methods have not been declared @deprecated.

Best, Fabian



2016-10-14 10:29 GMT+02:00 Stephan Ewen <se...@apache.org>:

> @Greg
>
> I think that would be amazing. It does require a bit of cleanup, though. As
> far as I know, the Hadoop dependency is additionally used for some Kerberos
> utilities and for its S3 file system implementation.
> We would need to make the Kerberos part Hadoop independent and the
> FileSystem loading dynamic (with a good exception that the Hadoop
> dependency should be added if the filesystem cannot be loaded).
>
> Stephan
>
>
> On Thu, Oct 13, 2016 at 8:55 PM, Greg Hogan <co...@greghogan.com> wrote:
>
> > Okay, this sounds prudent. Would this be the right time to implement
> > FLINK-2268 "Provide Flink binary release without Hadoop"?
> >
> > On Thu, Oct 13, 2016 at 11:25 AM, Stephan Ewen <se...@apache.org> wrote:
> >
> > > +1 for dropping Hadoop1 support
> > >
> > > @greg There is quite some complexity in the build setup and release
> > scripts
> > > and testing to support Hadoop 1. Also, we have to prepare to add
> support
> > > for Hadoop 3, and then supporting in addition Hadoop 1 seems very
> tough.
> > >
> > > Stephan
> > >
> > >
> > > On Thu, Oct 13, 2016 at 5:04 PM, Greg Hogan <co...@greghogan.com>
> wrote:
> > >
> > > > Hi Robert,
> > > >
> > > > What are the benefits to Flink for dropping Hadoop 1 support? Is
> there
> > > > significant code cleanup or would we simply be publishing one less
> set
> > of
> > > > artifacts?
> > > >
> > > > Greg
> > > >
> > > > On Thu, Oct 13, 2016 at 10:47 AM, Robert Metzger <
> rmetzger@apache.org>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > The Apache Hadoop community has recently released the first alpha
> > > version
> > > > > for Hadoop 3.0.0, while we are still supporting Hadoop 1. I think
> its
> > > > time
> > > > > to finally drop Hadoop 1 support in Flink.
> > > > >
> > > > > The last minor Hadoop 1 release was in 27 June, 2014.
> > > > > Apache Spark dropped Hadoop 1 support with their 2.0 release in
> July
> > > > 2016.
> > > > > Hadoop 2.2 was first released in October 2013, so there was enough
> > time
> > > > > for users to upgrade.
> > > > >
> > > > > I added also the user@ list to the discussion to get opinions
> about
> > > this
> > > > > from there as well.
> > > > >
> > > > > Let me know what you think about this!
> > > > >
> > > > >
> > > > > Regards,
> > > > > Robert
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Drop Hadoop 1 support with Flink 1.2

Posted by Stephan Ewen <se...@apache.org>.

@Greg

I think that would be amazing. It does require a bit of cleanup, though. As
far as I know, the Hadoop dependency is additionally used for some Kerberos
utilities and for its S3 file system implementation.
We would need to make the Kerberos part Hadoop independent and the
FileSystem loading dynamic (with a good exception that the Hadoop
dependency should be added if the filesystem cannot be loaded).

Stephan


On Thu, Oct 13, 2016 at 8:55 PM, Greg Hogan <co...@greghogan.com> wrote:

> Okay, this sounds prudent. Would this be the right time to implement
> FLINK-2268 "Provide Flink binary release without Hadoop"?
>
> On Thu, Oct 13, 2016 at 11:25 AM, Stephan Ewen <se...@apache.org> wrote:
>
> > +1 for dropping Hadoop1 support
> >
> > @greg There is quite some complexity in the build setup and release
> scripts
> > and testing to support Hadoop 1. Also, we have to prepare to add support
> > for Hadoop 3, and then supporting in addition Hadoop 1 seems very tough.
> >
> > Stephan
> >
> >
> > On Thu, Oct 13, 2016 at 5:04 PM, Greg Hogan <co...@greghogan.com> wrote:
> >
> > > Hi Robert,
> > >
> > > What are the benefits to Flink for dropping Hadoop 1 support? Is there
> > > significant code cleanup or would we simply be publishing one less set
> of
> > > artifacts?
> > >
> > > Greg
> > >
> > > On Thu, Oct 13, 2016 at 10:47 AM, Robert Metzger <rm...@apache.org>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > The Apache Hadoop community has recently released the first alpha
> > version
> > > > for Hadoop 3.0.0, while we are still supporting Hadoop 1. I think its
> > > time
> > > > to finally drop Hadoop 1 support in Flink.
> > > >
> > > > The last minor Hadoop 1 release was in 27 June, 2014.
> > > > Apache Spark dropped Hadoop 1 support with their 2.0 release in July
> > > 2016.
> > > > Hadoop 2.2 was first released in October 2013, so there was enough
> time
> > > > for users to upgrade.
> > > >
> > > > I added also the user@ list to the discussion to get opinions about
> > this
> > > > from there as well.
> > > >
> > > > Let me know what you think about this!
> > > >
> > > >
> > > > Regards,
> > > > Robert
> > > >
> > >
> >
>

Re: [DISCUSS] Drop Hadoop 1 support with Flink 1.2

Posted by Greg Hogan <co...@greghogan.com>.

Okay, this sounds prudent. Would this be the right time to implement
FLINK-2268 "Provide Flink binary release without Hadoop"?

On Thu, Oct 13, 2016 at 11:25 AM, Stephan Ewen <se...@apache.org> wrote:

> +1 for dropping Hadoop1 support
>
> @greg There is quite some complexity in the build setup and release scripts
> and testing to support Hadoop 1. Also, we have to prepare to add support
> for Hadoop 3, and then supporting in addition Hadoop 1 seems very tough.
>
> Stephan
>
>
> On Thu, Oct 13, 2016 at 5:04 PM, Greg Hogan <co...@greghogan.com> wrote:
>
> > Hi Robert,
> >
> > What are the benefits to Flink for dropping Hadoop 1 support? Is there
> > significant code cleanup or would we simply be publishing one less set of
> > artifacts?
> >
> > Greg
> >
> > On Thu, Oct 13, 2016 at 10:47 AM, Robert Metzger <rm...@apache.org>
> > wrote:
> >
> > > Hi,
> > >
> > > The Apache Hadoop community has recently released the first alpha
> version
> > > for Hadoop 3.0.0, while we are still supporting Hadoop 1. I think its
> > time
> > > to finally drop Hadoop 1 support in Flink.
> > >
> > > The last minor Hadoop 1 release was in 27 June, 2014.
> > > Apache Spark dropped Hadoop 1 support with their 2.0 release in July
> > 2016.
> > > Hadoop 2.2 was first released in October 2013, so there was enough time
> > > for users to upgrade.
> > >
> > > I added also the user@ list to the discussion to get opinions about
> this
> > > from there as well.
> > >
> > > Let me know what you think about this!
> > >
> > >
> > > Regards,
> > > Robert
> > >
> >
>

Re: [DISCUSS] Drop Hadoop 1 support with Flink 1.2

Posted by Stephan Ewen <se...@apache.org>.

+1 for dropping Hadoop1 support

@greg There is quite some complexity in the build setup and release scripts
and testing to support Hadoop 1. Also, we have to prepare to add support
for Hadoop 3, and then supporting in addition Hadoop 1 seems very tough.

Stephan


On Thu, Oct 13, 2016 at 5:04 PM, Greg Hogan <co...@greghogan.com> wrote:

> Hi Robert,
>
> What are the benefits to Flink for dropping Hadoop 1 support? Is there
> significant code cleanup or would we simply be publishing one less set of
> artifacts?
>
> Greg
>
> On Thu, Oct 13, 2016 at 10:47 AM, Robert Metzger <rm...@apache.org>
> wrote:
>
> > Hi,
> >
> > The Apache Hadoop community has recently released the first alpha version
> > for Hadoop 3.0.0, while we are still supporting Hadoop 1. I think its
> time
> > to finally drop Hadoop 1 support in Flink.
> >
> > The last minor Hadoop 1 release was in 27 June, 2014.
> > Apache Spark dropped Hadoop 1 support with their 2.0 release in July
> 2016.
> > Hadoop 2.2 was first released in October 2013, so there was enough time
> > for users to upgrade.
> >
> > I added also the user@ list to the discussion to get opinions about this
> > from there as well.
> >
> > Let me know what you think about this!
> >
> >
> > Regards,
> > Robert
> >
>

Re: [DISCUSS] Drop Hadoop 1 support with Flink 1.2

Posted by Greg Hogan <co...@greghogan.com>.

Hi Robert,

What are the benefits to Flink for dropping Hadoop 1 support? Is there
significant code cleanup or would we simply be publishing one less set of
artifacts?

Greg

On Thu, Oct 13, 2016 at 10:47 AM, Robert Metzger <rm...@apache.org>
wrote:

> Hi,
>
> The Apache Hadoop community has recently released the first alpha version
> for Hadoop 3.0.0, while we are still supporting Hadoop 1. I think its time
> to finally drop Hadoop 1 support in Flink.
>
> The last minor Hadoop 1 release was in 27 June, 2014.
> Apache Spark dropped Hadoop 1 support with their 2.0 release in July 2016.
> Hadoop 2.2 was first released in October 2013, so there was enough time
> for users to upgrade.
>
> I added also the user@ list to the discussion to get opinions about this
> from there as well.
>
> Let me know what you think about this!
>
>
> Regards,
> Robert
>