You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Fabian Hueske <fh...@gmail.com> on 2016/10/14 09:29:22 UTC

[DISCUSS] Deprecate Hadoop source method from (batch) ExecutionEnvironment

Hi everybody,

I would like to propose to deprecate the utility methods to read data with
Hadoop InputFormats from the (batch) ExecutionEnvironment.

The motivation for deprecating these methods is reduce Flink's dependency
on Hadoop but rather have Hadoop as an optional dependency for users that
actually need it (HDFS, MapRed-Compat, ...). Eventually, we want to have
Flink distribution that does not have a hard Hadoop dependency.

One step for this is to remove the Hadoop dependency from flink-java
(Flink's Java DataSet API) which is currently required due to the above
utility methods (see FLINK-4315). We recently received a PR that addresses
FLINK-4315 and removes the Hadoop methods from the ExecutionEnvironment.
After some discussion, it was decided to defer the PR to Flink 2.0 because
it breaks the API (these methods are delared @PublicEvolving).

I propose to accept this PR for Flink 1.2, but instead of removing the
methods deprecating them.
This would help to migrate old code and prevent new usage of these methods.
For a later Flink release (1.3 or 2.0) we could remove these methods and
the Hadoop dependency on flink-java.

What do others think?

Best, Fabian

Re: [DISCUSS] Deprecate Hadoop source method from (batch) ExecutionEnvironment

Posted by Shannon Carey <sc...@expedia.com>.

Yep!

From: Fabian Hueske <fh...@gmail.com>>
Date: Friday, October 14, 2016 at 11:00 AM
To: Shannon Carey <sc...@expedia.com>>
Cc: "user@flink.apache.org<ma...@flink.apache.org>" <us...@flink.apache.org>>
Subject: Re: [DISCUSS] Deprecate Hadoop source method from (batch) ExecutionEnvironment

Hi Shannon,

the plan is as follows:

We will keep the methods as they are for 1.2 but deprecate them and at the same time we will add alternatives in an optional dependency.
In a later release, the deprecated methods will be removed and everybody has to switch to the optional dependency.

Does that work for you?

Best, Fabian

2016-10-14 17:30 GMT+02:00 Shannon Carey <sc...@expedia.com>>:
Speaking as a user, if you are suggesting that you will retain the functionality but move the methods to an optional dependency, it makes sense to me. We have used the Hadoop integration for AvroParquetInputFormat and CqlBulkOutputFormat in Flink (although we won't be using CqlBulkOutputFormat any longer because it doesn't seem to be reliable).

-Shannon

From: Fabian Hueske <fh...@gmail.com>>
Date: Friday, October 14, 2016 at 4:29 AM
To: <us...@flink.apache.org>>, "dev@flink.apache.org<ma...@flink.apache.org>" <de...@flink.apache.org>>
Subject: [DISCUSS] Deprecate Hadoop source method from (batch) ExecutionEnvironment

Hi everybody,

I would like to propose to deprecate the utility methods to read data with Hadoop InputFormats from the (batch) ExecutionEnvironment.

The motivation for deprecating these methods is reduce Flink's dependency on Hadoop but rather have Hadoop as an optional dependency for users that actually need it (HDFS, MapRed-Compat, ...). Eventually, we want to have Flink distribution that does not have a hard Hadoop dependency.

One step for this is to remove the Hadoop dependency from flink-java (Flink's Java DataSet API) which is currently required due to the above utility methods (see FLINK-4315). We recently received a PR that addresses FLINK-4315 and removes the Hadoop methods from the ExecutionEnvironment. After some discussion, it was decided to defer the PR to Flink 2.0 because it breaks the API (these methods are delared @PublicEvolving).

I propose to accept this PR for Flink 1.2, but instead of removing the methods deprecating them.
This would help to migrate old code and prevent new usage of these methods.
For a later Flink release (1.3 or 2.0) we could remove these methods and the Hadoop dependency on flink-java.

What do others think?

Best, Fabian

Re: [DISCUSS] Deprecate Hadoop source method from (batch) ExecutionEnvironment

Posted by Fabian Hueske <fh...@gmail.com>.

Hi Shannon,

the plan is as follows:

We will keep the methods as they are for 1.2 but deprecate them and at the
same time we will add alternatives in an optional dependency.
In a later release, the deprecated methods will be removed and everybody
has to switch to the optional dependency.

Does that work for you?

Best, Fabian

2016-10-14 17:30 GMT+02:00 Shannon Carey <sc...@expedia.com>:

> Speaking as a user, if you are suggesting that you will retain the
> functionality but move the methods to an optional dependency, it makes
> sense to me. We have used the Hadoop integration for
> AvroParquetInputFormat and CqlBulkOutputFormat in Flink (although we won't
> be using CqlBulkOutputFormat any longer because it doesn't seem to be
> reliable).
>
> -Shannon
>
> From: Fabian Hueske <fh...@gmail.com>
> Date: Friday, October 14, 2016 at 4:29 AM
> To: <us...@flink.apache.org>, "dev@flink.apache.org" <de...@flink.apache.org>
> Subject: [DISCUSS] Deprecate Hadoop source method from (batch)
> ExecutionEnvironment
>
> Hi everybody,
>
> I would like to propose to deprecate the utility methods to read data with
> Hadoop InputFormats from the (batch) ExecutionEnvironment.
>
> The motivation for deprecating these methods is reduce Flink's dependency
> on Hadoop but rather have Hadoop as an optional dependency for users that
> actually need it (HDFS, MapRed-Compat, ...). Eventually, we want to have
> Flink distribution that does not have a hard Hadoop dependency.
>
> One step for this is to remove the Hadoop dependency from flink-java
> (Flink's Java DataSet API) which is currently required due to the above
> utility methods (see FLINK-4315). We recently received a PR that addresses
> FLINK-4315 and removes the Hadoop methods from the ExecutionEnvironment.
> After some discussion, it was decided to defer the PR to Flink 2.0 because
> it breaks the API (these methods are delared @PublicEvolving).
>
> I propose to accept this PR for Flink 1.2, but instead of removing the
> methods deprecating them.
> This would help to migrate old code and prevent new usage of these methods.
> For a later Flink release (1.3 or 2.0) we could remove these methods and
> the Hadoop dependency on flink-java.
>
> What do others think?
>
> Best, Fabian
>

Re: [DISCUSS] Deprecate Hadoop source method from (batch) ExecutionEnvironment

Posted by Shannon Carey <sc...@expedia.com>.

Speaking as a user, if you are suggesting that you will retain the functionality but move the methods to an optional dependency, it makes sense to me. We have used the Hadoop integration for AvroParquetInputFormat and CqlBulkOutputFormat in Flink (although we won't be using CqlBulkOutputFormat any longer because it doesn't seem to be reliable).

-Shannon

From: Fabian Hueske <fh...@gmail.com>>
Date: Friday, October 14, 2016 at 4:29 AM
To: <us...@flink.apache.org>>, "dev@flink.apache.org<ma...@flink.apache.org>" <de...@flink.apache.org>>
Subject: [DISCUSS] Deprecate Hadoop source method from (batch) ExecutionEnvironment

Hi everybody,

I would like to propose to deprecate the utility methods to read data with Hadoop InputFormats from the (batch) ExecutionEnvironment.

The motivation for deprecating these methods is reduce Flink's dependency on Hadoop but rather have Hadoop as an optional dependency for users that actually need it (HDFS, MapRed-Compat, ...). Eventually, we want to have Flink distribution that does not have a hard Hadoop dependency.

One step for this is to remove the Hadoop dependency from flink-java (Flink's Java DataSet API) which is currently required due to the above utility methods (see FLINK-4315). We recently received a PR that addresses FLINK-4315 and removes the Hadoop methods from the ExecutionEnvironment. After some discussion, it was decided to defer the PR to Flink 2.0 because it breaks the API (these methods are delared @PublicEvolving).

I propose to accept this PR for Flink 1.2, but instead of removing the methods deprecating them.
This would help to migrate old code and prevent new usage of these methods.
For a later Flink release (1.3 or 2.0) we could remove these methods and the Hadoop dependency on flink-java.

What do others think?

Best, Fabian

Re: [DISCUSS] Deprecate Hadoop source method from (batch) ExecutionEnvironment

Posted by Robert Metzger <rm...@apache.org>.

+1

On Fri, Oct 14, 2016 at 12:04 PM, Stephan Ewen <se...@apache.org> wrote:

> +1
>
> On Fri, Oct 14, 2016 at 11:54 AM, Greg Hogan <co...@greghogan.com> wrote:
>
> > +1
> >
> > On Fri, Oct 14, 2016 at 5:29 AM, Fabian Hueske <fh...@gmail.com>
> wrote:
> >
> > > Hi everybody,
> > >
> > > I would like to propose to deprecate the utility methods to read data
> > with
> > > Hadoop InputFormats from the (batch) ExecutionEnvironment.
> > >
> > > The motivation for deprecating these methods is reduce Flink's
> dependency
> > > on Hadoop but rather have Hadoop as an optional dependency for users
> that
> > > actually need it (HDFS, MapRed-Compat, ...). Eventually, we want to
> have
> > > Flink distribution that does not have a hard Hadoop dependency.
> > >
> > > One step for this is to remove the Hadoop dependency from flink-java
> > > (Flink's Java DataSet API) which is currently required due to the above
> > > utility methods (see FLINK-4315). We recently received a PR that
> > addresses
> > > FLINK-4315 and removes the Hadoop methods from the
> ExecutionEnvironment.
> > > After some discussion, it was decided to defer the PR to Flink 2.0
> > because
> > > it breaks the API (these methods are delared @PublicEvolving).
> > >
> > > I propose to accept this PR for Flink 1.2, but instead of removing the
> > > methods deprecating them.
> > > This would help to migrate old code and prevent new usage of these
> > methods.
> > > For a later Flink release (1.3 or 2.0) we could remove these methods
> and
> > > the Hadoop dependency on flink-java.
> > >
> > > What do others think?
> > >
> > > Best, Fabian
> > >
> >
>

Re: [DISCUSS] Deprecate Hadoop source method from (batch) ExecutionEnvironment

Posted by Stephan Ewen <se...@apache.org>.

+1

On Fri, Oct 14, 2016 at 11:54 AM, Greg Hogan <co...@greghogan.com> wrote:

> +1
>
> On Fri, Oct 14, 2016 at 5:29 AM, Fabian Hueske <fh...@gmail.com> wrote:
>
> > Hi everybody,
> >
> > I would like to propose to deprecate the utility methods to read data
> with
> > Hadoop InputFormats from the (batch) ExecutionEnvironment.
> >
> > The motivation for deprecating these methods is reduce Flink's dependency
> > on Hadoop but rather have Hadoop as an optional dependency for users that
> > actually need it (HDFS, MapRed-Compat, ...). Eventually, we want to have
> > Flink distribution that does not have a hard Hadoop dependency.
> >
> > One step for this is to remove the Hadoop dependency from flink-java
> > (Flink's Java DataSet API) which is currently required due to the above
> > utility methods (see FLINK-4315). We recently received a PR that
> addresses
> > FLINK-4315 and removes the Hadoop methods from the ExecutionEnvironment.
> > After some discussion, it was decided to defer the PR to Flink 2.0
> because
> > it breaks the API (these methods are delared @PublicEvolving).
> >
> > I propose to accept this PR for Flink 1.2, but instead of removing the
> > methods deprecating them.
> > This would help to migrate old code and prevent new usage of these
> methods.
> > For a later Flink release (1.3 or 2.0) we could remove these methods and
> > the Hadoop dependency on flink-java.
> >
> > What do others think?
> >
> > Best, Fabian
> >
>

Re: [DISCUSS] Deprecate Hadoop source method from (batch) ExecutionEnvironment

Posted by Greg Hogan <co...@greghogan.com>.

+1

On Fri, Oct 14, 2016 at 5:29 AM, Fabian Hueske <fh...@gmail.com> wrote:

> Hi everybody,
>
> I would like to propose to deprecate the utility methods to read data with
> Hadoop InputFormats from the (batch) ExecutionEnvironment.
>
> The motivation for deprecating these methods is reduce Flink's dependency
> on Hadoop but rather have Hadoop as an optional dependency for users that
> actually need it (HDFS, MapRed-Compat, ...). Eventually, we want to have
> Flink distribution that does not have a hard Hadoop dependency.
>
> One step for this is to remove the Hadoop dependency from flink-java
> (Flink's Java DataSet API) which is currently required due to the above
> utility methods (see FLINK-4315). We recently received a PR that addresses
> FLINK-4315 and removes the Hadoop methods from the ExecutionEnvironment.
> After some discussion, it was decided to defer the PR to Flink 2.0 because
> it breaks the API (these methods are delared @PublicEvolving).
>
> I propose to accept this PR for Flink 1.2, but instead of removing the
> methods deprecating them.
> This would help to migrate old code and prevent new usage of these methods.
> For a later Flink release (1.3 or 2.0) we could remove these methods and
> the Hadoop dependency on flink-java.
>
> What do others think?
>
> Best, Fabian
>

Re: [DISCUSS] Deprecate Hadoop source method from (batch) ExecutionEnvironment

Posted by Aljoscha Krettek <al...@apache.org>.

+1 for deprecating and the removing.

On Fri, 14 Oct 2016 at 11:38 Till Rohrmann <tr...@apache.org> wrote:

> Fabian's proposal sounds good to me. It would be a good first step towards
> removing our dependency on Hadoop.
>
> Thus, +1 for the changes.
>
> Cheers,
> Till
>
> On Fri, Oct 14, 2016 at 11:29 AM, Fabian Hueske <fh...@gmail.com> wrote:
>
> Hi everybody,
>
> I would like to propose to deprecate the utility methods to read data with
> Hadoop InputFormats from the (batch) ExecutionEnvironment.
>
> The motivation for deprecating these methods is reduce Flink's dependency
> on Hadoop but rather have Hadoop as an optional dependency for users that
> actually need it (HDFS, MapRed-Compat, ...). Eventually, we want to have
> Flink distribution that does not have a hard Hadoop dependency.
>
> One step for this is to remove the Hadoop dependency from flink-java
> (Flink's Java DataSet API) which is currently required due to the above
> utility methods (see FLINK-4315). We recently received a PR that addresses
> FLINK-4315 and removes the Hadoop methods from the ExecutionEnvironment.
> After some discussion, it was decided to defer the PR to Flink 2.0 because
> it breaks the API (these methods are delared @PublicEvolving).
>
> I propose to accept this PR for Flink 1.2, but instead of removing the
> methods deprecating them.
> This would help to migrate old code and prevent new usage of these methods.
> For a later Flink release (1.3 or 2.0) we could remove these methods and
> the Hadoop dependency on flink-java.
>
> What do others think?
>
> Best, Fabian
>
>
>

Re: [DISCUSS] Deprecate Hadoop source method from (batch) ExecutionEnvironment

Posted by Till Rohrmann <tr...@apache.org>.

Fabian's proposal sounds good to me. It would be a good first step towards
removing our dependency on Hadoop.

Thus, +1 for the changes.

Cheers,
Till

On Fri, Oct 14, 2016 at 11:29 AM, Fabian Hueske <fh...@gmail.com> wrote:

> Hi everybody,
>
> I would like to propose to deprecate the utility methods to read data with
> Hadoop InputFormats from the (batch) ExecutionEnvironment.
>
> The motivation for deprecating these methods is reduce Flink's dependency
> on Hadoop but rather have Hadoop as an optional dependency for users that
> actually need it (HDFS, MapRed-Compat, ...). Eventually, we want to have
> Flink distribution that does not have a hard Hadoop dependency.
>
> One step for this is to remove the Hadoop dependency from flink-java
> (Flink's Java DataSet API) which is currently required due to the above
> utility methods (see FLINK-4315). We recently received a PR that addresses
> FLINK-4315 and removes the Hadoop methods from the ExecutionEnvironment.
> After some discussion, it was decided to defer the PR to Flink 2.0 because
> it breaks the API (these methods are delared @PublicEvolving).
>
> I propose to accept this PR for Flink 1.2, but instead of removing the
> methods deprecating them.
> This would help to migrate old code and prevent new usage of these methods.
> For a later Flink release (1.3 or 2.0) we could remove these methods and
> the Hadoop dependency on flink-java.
>
> What do others think?
>
> Best, Fabian
>