You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by David Morávek <dm...@apache.org> on 2021/12/14 08:25:00 UTC

[DISCUSS] Changing the minimal supported version of Hadoop

Hi,

I'd like to start a discussion about upgrading a minimal Hadoop version
that Flink supports.

Even though the default value for `hadoop.version` property is set to
2.8.3, we're still ensuring both runtime and compile compatibility with
Hadoop 2.4.x with the scheduled pipeline[1].

Here is list of dates of the latest releases for each minor version up to
2.8.x

- Hadoop 2.4.1: Last commit on 6/30/2014
- Hadoop 2.5.2: Last commit on 11/15/2014
- Hadoop 2.6.5: Last commit on 10/11/2016
- Hadoop 2.7.7: Last commit on 7/18/2018
- Hadoop 2.8.5: Last commit on 9/8/2018

Since then there were two more minor releases in 2.x branch and four more
minor releases in 3.x branch.

Supporting the older version involves reflection-based "hacks" for
supporting multiple versions.

My proposal would be changing the minimum supported version *to 2.8.5*.
This should simplify the hadoop related codebase and simplify the CI build
infrastructure as we won't have to test for the older versions.

Please note that this only involves a minimal *client side* compatibility.
The wire protocol should remain compatible with earlier versions [2], so we
should be able to talk with any servers in 2.x major branch.

One small note for the 2.8.x branch, some of the classes we need are only
available in 2.8.4 version and above, but I'm not sure we should take an
eventual need for upgrading a patch version into consideration here,
because both 2.8.4 and 2.8.5 are pretty old.

WDYT, is it already time to upgrade? Looking forward for any thoughts on
the topic!

[1]
https://github.com/apache/flink/blob/release-1.14.0/tools/azure-pipelines/build-apache-repo.yml#L123
[2]
https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility

Best,
D.

Re: [DISCUSS] Changing the minimal supported version of Hadoop

Posted by Xintong Song <to...@gmail.com>.
+1
Thanks for driving this, David.

Thank you~

Xintong Song



On Tue, Jan 4, 2022 at 4:28 AM Thomas Weise <th...@apache.org> wrote:

> +1 for bumping minimum supported Hadoop version to 2.8.5
>
> On Mon, Jan 3, 2022 at 12:25 AM David Morávek <dm...@apache.org> wrote:
> >
> > As there were no strong objections, we'll proceed with bumping the Hadoop
> > version to 2.8.5 and removing the safeguards and the CI for any earlier
> > versions. This will effectively make the Hadoop 2.8.5 the least supported
> > version in Flink 1.15.
> >
> > Best,
> > D.
> >
> > On Thu, Dec 23, 2021 at 11:03 AM Till Rohrmann <tr...@apache.org>
> wrote:
> >
> > > If there are no users strongly objecting to dropping Hadoop support
> for <
> > > 2.8, then I am +1 for this since otherwise we won't gain a lot as
> Xintong
> > > said.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Dec 22, 2021 at 10:33 AM David Morávek <dm...@apache.org>
> wrote:
> > >
> > > > Agreed, if we drop the CI for lower versions, there is actually no
> point
> > > > of having safeguards as we can't really test for them.
> > > >
> > > > Maybe one more thought (it's more of a feeling), I feel that users
> > > running
> > > > really old Hadoop versions are usually slower to adopt (they most
> likely
> > > > use what the current HDP / CDH version they use offers) and they are
> less
> > > > likely to use Flink 1.15 any time soon, but I don't have any strong
> data
> > > to
> > > > support this.
> > > >
> > > > D.
> > > >
> > >
>

Re: [DISCUSS] Changing the minimal supported version of Hadoop

Posted by Thomas Weise <th...@apache.org>.
+1 for bumping minimum supported Hadoop version to 2.8.5

On Mon, Jan 3, 2022 at 12:25 AM David Morávek <dm...@apache.org> wrote:
>
> As there were no strong objections, we'll proceed with bumping the Hadoop
> version to 2.8.5 and removing the safeguards and the CI for any earlier
> versions. This will effectively make the Hadoop 2.8.5 the least supported
> version in Flink 1.15.
>
> Best,
> D.
>
> On Thu, Dec 23, 2021 at 11:03 AM Till Rohrmann <tr...@apache.org> wrote:
>
> > If there are no users strongly objecting to dropping Hadoop support for <
> > 2.8, then I am +1 for this since otherwise we won't gain a lot as Xintong
> > said.
> >
> > Cheers,
> > Till
> >
> > On Wed, Dec 22, 2021 at 10:33 AM David Morávek <dm...@apache.org> wrote:
> >
> > > Agreed, if we drop the CI for lower versions, there is actually no point
> > > of having safeguards as we can't really test for them.
> > >
> > > Maybe one more thought (it's more of a feeling), I feel that users
> > running
> > > really old Hadoop versions are usually slower to adopt (they most likely
> > > use what the current HDP / CDH version they use offers) and they are less
> > > likely to use Flink 1.15 any time soon, but I don't have any strong data
> > to
> > > support this.
> > >
> > > D.
> > >
> >

Re: [DISCUSS] Changing the minimal supported version of Hadoop

Posted by David Morávek <dm...@apache.org>.
As there were no strong objections, we'll proceed with bumping the Hadoop
version to 2.8.5 and removing the safeguards and the CI for any earlier
versions. This will effectively make the Hadoop 2.8.5 the least supported
version in Flink 1.15.

Best,
D.

On Thu, Dec 23, 2021 at 11:03 AM Till Rohrmann <tr...@apache.org> wrote:

> If there are no users strongly objecting to dropping Hadoop support for <
> 2.8, then I am +1 for this since otherwise we won't gain a lot as Xintong
> said.
>
> Cheers,
> Till
>
> On Wed, Dec 22, 2021 at 10:33 AM David Morávek <dm...@apache.org> wrote:
>
> > Agreed, if we drop the CI for lower versions, there is actually no point
> > of having safeguards as we can't really test for them.
> >
> > Maybe one more thought (it's more of a feeling), I feel that users
> running
> > really old Hadoop versions are usually slower to adopt (they most likely
> > use what the current HDP / CDH version they use offers) and they are less
> > likely to use Flink 1.15 any time soon, but I don't have any strong data
> to
> > support this.
> >
> > D.
> >
>

Re: [DISCUSS] Changing the minimal supported version of Hadoop

Posted by David Morávek <dm...@apache.org>.
As there were no strong objections, we'll proceed with bumping the Hadoop
version to 2.8.5 and removing the safeguards and the CI for any earlier
versions. This will effectively make the Hadoop 2.8.5 the least supported
version in Flink 1.15.

Best,
D.

On Thu, Dec 23, 2021 at 11:03 AM Till Rohrmann <tr...@apache.org> wrote:

> If there are no users strongly objecting to dropping Hadoop support for <
> 2.8, then I am +1 for this since otherwise we won't gain a lot as Xintong
> said.
>
> Cheers,
> Till
>
> On Wed, Dec 22, 2021 at 10:33 AM David Morávek <dm...@apache.org> wrote:
>
> > Agreed, if we drop the CI for lower versions, there is actually no point
> > of having safeguards as we can't really test for them.
> >
> > Maybe one more thought (it's more of a feeling), I feel that users
> running
> > really old Hadoop versions are usually slower to adopt (they most likely
> > use what the current HDP / CDH version they use offers) and they are less
> > likely to use Flink 1.15 any time soon, but I don't have any strong data
> to
> > support this.
> >
> > D.
> >
>

Re: [DISCUSS] Changing the minimal supported version of Hadoop

Posted by Till Rohrmann <tr...@apache.org>.
If there are no users strongly objecting to dropping Hadoop support for <
2.8, then I am +1 for this since otherwise we won't gain a lot as Xintong
said.

Cheers,
Till

On Wed, Dec 22, 2021 at 10:33 AM David Morávek <dm...@apache.org> wrote:

> Agreed, if we drop the CI for lower versions, there is actually no point
> of having safeguards as we can't really test for them.
>
> Maybe one more thought (it's more of a feeling), I feel that users running
> really old Hadoop versions are usually slower to adopt (they most likely
> use what the current HDP / CDH version they use offers) and they are less
> likely to use Flink 1.15 any time soon, but I don't have any strong data to
> support this.
>
> D.
>

Re: [DISCUSS] Changing the minimal supported version of Hadoop

Posted by Till Rohrmann <tr...@apache.org>.
If there are no users strongly objecting to dropping Hadoop support for <
2.8, then I am +1 for this since otherwise we won't gain a lot as Xintong
said.

Cheers,
Till

On Wed, Dec 22, 2021 at 10:33 AM David Morávek <dm...@apache.org> wrote:

> Agreed, if we drop the CI for lower versions, there is actually no point
> of having safeguards as we can't really test for them.
>
> Maybe one more thought (it's more of a feeling), I feel that users running
> really old Hadoop versions are usually slower to adopt (they most likely
> use what the current HDP / CDH version they use offers) and they are less
> likely to use Flink 1.15 any time soon, but I don't have any strong data to
> support this.
>
> D.
>

Re: [DISCUSS] Changing the minimal supported version of Hadoop

Posted by David Morávek <dm...@apache.org>.
Agreed, if we drop the CI for lower versions, there is actually no point of
having safeguards as we can't really test for them.

Maybe one more thought (it's more of a feeling), I feel that users running
really old Hadoop versions are usually slower to adopt (they most likely
use what the current HDP / CDH version they use offers) and they are less
likely to use Flink 1.15 any time soon, but I don't have any strong data to
support this.

D.

Re: [DISCUSS] Changing the minimal supported version of Hadoop

Posted by David Morávek <dm...@apache.org>.
Agreed, if we drop the CI for lower versions, there is actually no point of
having safeguards as we can't really test for them.

Maybe one more thought (it's more of a feeling), I feel that users running
really old Hadoop versions are usually slower to adopt (they most likely
use what the current HDP / CDH version they use offers) and they are less
likely to use Flink 1.15 any time soon, but I don't have any strong data to
support this.

D.

Re: [DISCUSS] Changing the minimal supported version of Hadoop

Posted by Xintong Song <to...@gmail.com>.
Sorry to join the discussion late.

+1 for dropping support for hadoop versions < 2.8 from my side.

TBH, warping the reflection based logic with safeguards sounds a bit
neither fish nor fowl to me. It weakens the major benefits that we look for
by dropping support for early versions.
- The codebase is simplified, but not significantly. We still have the
complexity of understanding which APIs may not exist in early versions.
- Without CI, we provide no guarantee that Flink will still work with early
hadoop versions. Or otherwise we fail to simplify the CI.

I'd suggest to say we no longer support hadoop versions < 2.8 at all. And
if that is not permitted by our users, we may consider to keep the codebase
as is and wait for a bit longer.

WDYT?

Thank you~

Xintong Song


[1]
https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility

On Wed, Dec 22, 2021 at 12:52 AM David Morávek <dm...@apache.org> wrote:

> CC user@f.a.o
>
> Is anyone aware of something that blocks us from doing the upgrade?
>
> D.
>
> On Tue, Dec 21, 2021 at 5:50 PM David Morávek <da...@gmail.com>
> wrote:
>
>> Hi Martijn,
>>
>> from person experience, most Hadoop users are lagging behind the release
>> lines by a lot, because upgrading a Hadoop cluster is not really a simply
>> task to achieve. I think for now, we can stay a bit conservative, nothing
>> blocks us for using 2.8.5 as we don't use any "newer" APIs in the code.
>>
>> As for Till's concern, we can still wrap the reflection based logic, to
>> be skipped in case of "NoClassDefFound" instead of "ClassNotFound" as we do
>> now.
>>
>> D.
>>
>>
>> On Tue, Dec 14, 2021 at 5:23 PM Martijn Visser <ma...@ververica.com>
>> wrote:
>>
>>> Hi David,
>>>
>>> Thanks for bringing this up for discussion! Given that Hadoop 2.8 is
>>> considered EOL, shouldn't we bump the version to Hadoop 2.10? [1]
>>>
>>> Best regards,
>>>
>>> Martijn
>>>
>>> [1]
>>>
>>> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Active+Release+Lines
>>>
>>> On Tue, 14 Dec 2021 at 10:28, Till Rohrmann <tr...@apache.org>
>>> wrote:
>>>
>>> > Hi David,
>>> >
>>> > I think we haven't updated our Hadoop dependencies in a long time.
>>> Hence,
>>> > it is probably time to do so. So +1 for upgrading to the latest patch
>>> > release.
>>> >
>>> > If newer 2.x Hadoop versions are compatible with 2.y with x >= y, then
>>> I
>>> > don't see a problem with dropping support for pre-bundled Hadoop
>>> versions <
>>> > 2.8. This could indeed help us decrease our build matrix a bit and,
>>> thus,
>>> > saving some build time.
>>> >
>>> > Concerning simplifying our code base to get rid of reflection logic
>>> etc. we
>>> > still might have to add a safeguard for features that are not
>>> supported by
>>> > earlier versions. According to the docs
>>> >
>>> > > YARN applications that attempt to use new APIs (including new fields
>>> in
>>> > data structures) that have not yet been deployed to the cluster can
>>> expect
>>> > link exceptions
>>> >
>>> > we can see link exceptions. We could get around this by saying that
>>> Flink
>>> > no longer supports Hadoop < 2.8. But this should be checked with our
>>> users
>>> > on the user ML at least.
>>> >
>>> > Cheers,
>>> > Till
>>> >
>>> > On Tue, Dec 14, 2021 at 9:25 AM David Morávek <dm...@apache.org> wrote:
>>> >
>>> > > Hi,
>>> > >
>>> > > I'd like to start a discussion about upgrading a minimal Hadoop
>>> version
>>> > > that Flink supports.
>>> > >
>>> > > Even though the default value for `hadoop.version` property is set to
>>> > > 2.8.3, we're still ensuring both runtime and compile compatibility
>>> with
>>> > > Hadoop 2.4.x with the scheduled pipeline[1].
>>> > >
>>> > > Here is list of dates of the latest releases for each minor version
>>> up to
>>> > > 2.8.x
>>> > >
>>> > > - Hadoop 2.4.1: Last commit on 6/30/2014
>>> > > - Hadoop 2.5.2: Last commit on 11/15/2014
>>> > > - Hadoop 2.6.5: Last commit on 10/11/2016
>>> > > - Hadoop 2.7.7: Last commit on 7/18/2018
>>> > > - Hadoop 2.8.5: Last commit on 9/8/2018
>>> > >
>>> > > Since then there were two more minor releases in 2.x branch and four
>>> more
>>> > > minor releases in 3.x branch.
>>> > >
>>> > > Supporting the older version involves reflection-based "hacks" for
>>> > > supporting multiple versions.
>>> > >
>>> > > My proposal would be changing the minimum supported version *to
>>> 2.8.5*.
>>> > > This should simplify the hadoop related codebase and simplify the CI
>>> > build
>>> > > infrastructure as we won't have to test for the older versions.
>>> > >
>>> > > Please note that this only involves a minimal *client side*
>>> > compatibility.
>>> > > The wire protocol should remain compatible with earlier versions
>>> [2], so
>>> > we
>>> > > should be able to talk with any servers in 2.x major branch.
>>> > >
>>> > > One small note for the 2.8.x branch, some of the classes we need are
>>> only
>>> > > available in 2.8.4 version and above, but I'm not sure we should
>>> take an
>>> > > eventual need for upgrading a patch version into consideration here,
>>> > > because both 2.8.4 and 2.8.5 are pretty old.
>>> > >
>>> > > WDYT, is it already time to upgrade? Looking forward for any
>>> thoughts on
>>> > > the topic!
>>> > >
>>> > > [1]
>>> > >
>>> > >
>>> >
>>> https://github.com/apache/flink/blob/release-1.14.0/tools/azure-pipelines/build-apache-repo.yml#L123
>>> > > [2]
>>> > >
>>> > >
>>> >
>>> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility
>>> > >
>>> > > Best,
>>> > > D.
>>> > >
>>> >
>>>
>>

Re: [DISCUSS] Changing the minimal supported version of Hadoop

Posted by Xintong Song <to...@gmail.com>.
Sorry to join the discussion late.

+1 for dropping support for hadoop versions < 2.8 from my side.

TBH, warping the reflection based logic with safeguards sounds a bit
neither fish nor fowl to me. It weakens the major benefits that we look for
by dropping support for early versions.
- The codebase is simplified, but not significantly. We still have the
complexity of understanding which APIs may not exist in early versions.
- Without CI, we provide no guarantee that Flink will still work with early
hadoop versions. Or otherwise we fail to simplify the CI.

I'd suggest to say we no longer support hadoop versions < 2.8 at all. And
if that is not permitted by our users, we may consider to keep the codebase
as is and wait for a bit longer.

WDYT?

Thank you~

Xintong Song


[1]
https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility

On Wed, Dec 22, 2021 at 12:52 AM David Morávek <dm...@apache.org> wrote:

> CC user@f.a.o
>
> Is anyone aware of something that blocks us from doing the upgrade?
>
> D.
>
> On Tue, Dec 21, 2021 at 5:50 PM David Morávek <da...@gmail.com>
> wrote:
>
>> Hi Martijn,
>>
>> from person experience, most Hadoop users are lagging behind the release
>> lines by a lot, because upgrading a Hadoop cluster is not really a simply
>> task to achieve. I think for now, we can stay a bit conservative, nothing
>> blocks us for using 2.8.5 as we don't use any "newer" APIs in the code.
>>
>> As for Till's concern, we can still wrap the reflection based logic, to
>> be skipped in case of "NoClassDefFound" instead of "ClassNotFound" as we do
>> now.
>>
>> D.
>>
>>
>> On Tue, Dec 14, 2021 at 5:23 PM Martijn Visser <ma...@ververica.com>
>> wrote:
>>
>>> Hi David,
>>>
>>> Thanks for bringing this up for discussion! Given that Hadoop 2.8 is
>>> considered EOL, shouldn't we bump the version to Hadoop 2.10? [1]
>>>
>>> Best regards,
>>>
>>> Martijn
>>>
>>> [1]
>>>
>>> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Active+Release+Lines
>>>
>>> On Tue, 14 Dec 2021 at 10:28, Till Rohrmann <tr...@apache.org>
>>> wrote:
>>>
>>> > Hi David,
>>> >
>>> > I think we haven't updated our Hadoop dependencies in a long time.
>>> Hence,
>>> > it is probably time to do so. So +1 for upgrading to the latest patch
>>> > release.
>>> >
>>> > If newer 2.x Hadoop versions are compatible with 2.y with x >= y, then
>>> I
>>> > don't see a problem with dropping support for pre-bundled Hadoop
>>> versions <
>>> > 2.8. This could indeed help us decrease our build matrix a bit and,
>>> thus,
>>> > saving some build time.
>>> >
>>> > Concerning simplifying our code base to get rid of reflection logic
>>> etc. we
>>> > still might have to add a safeguard for features that are not
>>> supported by
>>> > earlier versions. According to the docs
>>> >
>>> > > YARN applications that attempt to use new APIs (including new fields
>>> in
>>> > data structures) that have not yet been deployed to the cluster can
>>> expect
>>> > link exceptions
>>> >
>>> > we can see link exceptions. We could get around this by saying that
>>> Flink
>>> > no longer supports Hadoop < 2.8. But this should be checked with our
>>> users
>>> > on the user ML at least.
>>> >
>>> > Cheers,
>>> > Till
>>> >
>>> > On Tue, Dec 14, 2021 at 9:25 AM David Morávek <dm...@apache.org> wrote:
>>> >
>>> > > Hi,
>>> > >
>>> > > I'd like to start a discussion about upgrading a minimal Hadoop
>>> version
>>> > > that Flink supports.
>>> > >
>>> > > Even though the default value for `hadoop.version` property is set to
>>> > > 2.8.3, we're still ensuring both runtime and compile compatibility
>>> with
>>> > > Hadoop 2.4.x with the scheduled pipeline[1].
>>> > >
>>> > > Here is list of dates of the latest releases for each minor version
>>> up to
>>> > > 2.8.x
>>> > >
>>> > > - Hadoop 2.4.1: Last commit on 6/30/2014
>>> > > - Hadoop 2.5.2: Last commit on 11/15/2014
>>> > > - Hadoop 2.6.5: Last commit on 10/11/2016
>>> > > - Hadoop 2.7.7: Last commit on 7/18/2018
>>> > > - Hadoop 2.8.5: Last commit on 9/8/2018
>>> > >
>>> > > Since then there were two more minor releases in 2.x branch and four
>>> more
>>> > > minor releases in 3.x branch.
>>> > >
>>> > > Supporting the older version involves reflection-based "hacks" for
>>> > > supporting multiple versions.
>>> > >
>>> > > My proposal would be changing the minimum supported version *to
>>> 2.8.5*.
>>> > > This should simplify the hadoop related codebase and simplify the CI
>>> > build
>>> > > infrastructure as we won't have to test for the older versions.
>>> > >
>>> > > Please note that this only involves a minimal *client side*
>>> > compatibility.
>>> > > The wire protocol should remain compatible with earlier versions
>>> [2], so
>>> > we
>>> > > should be able to talk with any servers in 2.x major branch.
>>> > >
>>> > > One small note for the 2.8.x branch, some of the classes we need are
>>> only
>>> > > available in 2.8.4 version and above, but I'm not sure we should
>>> take an
>>> > > eventual need for upgrading a patch version into consideration here,
>>> > > because both 2.8.4 and 2.8.5 are pretty old.
>>> > >
>>> > > WDYT, is it already time to upgrade? Looking forward for any
>>> thoughts on
>>> > > the topic!
>>> > >
>>> > > [1]
>>> > >
>>> > >
>>> >
>>> https://github.com/apache/flink/blob/release-1.14.0/tools/azure-pipelines/build-apache-repo.yml#L123
>>> > > [2]
>>> > >
>>> > >
>>> >
>>> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility
>>> > >
>>> > > Best,
>>> > > D.
>>> > >
>>> >
>>>
>>

Re: [DISCUSS] Changing the minimal supported version of Hadoop

Posted by David Morávek <dm...@apache.org>.
CC user@f.a.o

Is anyone aware of something that blocks us from doing the upgrade?

D.

On Tue, Dec 21, 2021 at 5:50 PM David Morávek <da...@gmail.com>
wrote:

> Hi Martijn,
>
> from person experience, most Hadoop users are lagging behind the release
> lines by a lot, because upgrading a Hadoop cluster is not really a simply
> task to achieve. I think for now, we can stay a bit conservative, nothing
> blocks us for using 2.8.5 as we don't use any "newer" APIs in the code.
>
> As for Till's concern, we can still wrap the reflection based logic, to be
> skipped in case of "NoClassDefFound" instead of "ClassNotFound" as we do
> now.
>
> D.
>
>
> On Tue, Dec 14, 2021 at 5:23 PM Martijn Visser <ma...@ververica.com>
> wrote:
>
>> Hi David,
>>
>> Thanks for bringing this up for discussion! Given that Hadoop 2.8 is
>> considered EOL, shouldn't we bump the version to Hadoop 2.10? [1]
>>
>> Best regards,
>>
>> Martijn
>>
>> [1]
>>
>> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Active+Release+Lines
>>
>> On Tue, 14 Dec 2021 at 10:28, Till Rohrmann <tr...@apache.org> wrote:
>>
>> > Hi David,
>> >
>> > I think we haven't updated our Hadoop dependencies in a long time.
>> Hence,
>> > it is probably time to do so. So +1 for upgrading to the latest patch
>> > release.
>> >
>> > If newer 2.x Hadoop versions are compatible with 2.y with x >= y, then I
>> > don't see a problem with dropping support for pre-bundled Hadoop
>> versions <
>> > 2.8. This could indeed help us decrease our build matrix a bit and,
>> thus,
>> > saving some build time.
>> >
>> > Concerning simplifying our code base to get rid of reflection logic
>> etc. we
>> > still might have to add a safeguard for features that are not supported
>> by
>> > earlier versions. According to the docs
>> >
>> > > YARN applications that attempt to use new APIs (including new fields
>> in
>> > data structures) that have not yet been deployed to the cluster can
>> expect
>> > link exceptions
>> >
>> > we can see link exceptions. We could get around this by saying that
>> Flink
>> > no longer supports Hadoop < 2.8. But this should be checked with our
>> users
>> > on the user ML at least.
>> >
>> > Cheers,
>> > Till
>> >
>> > On Tue, Dec 14, 2021 at 9:25 AM David Morávek <dm...@apache.org> wrote:
>> >
>> > > Hi,
>> > >
>> > > I'd like to start a discussion about upgrading a minimal Hadoop
>> version
>> > > that Flink supports.
>> > >
>> > > Even though the default value for `hadoop.version` property is set to
>> > > 2.8.3, we're still ensuring both runtime and compile compatibility
>> with
>> > > Hadoop 2.4.x with the scheduled pipeline[1].
>> > >
>> > > Here is list of dates of the latest releases for each minor version
>> up to
>> > > 2.8.x
>> > >
>> > > - Hadoop 2.4.1: Last commit on 6/30/2014
>> > > - Hadoop 2.5.2: Last commit on 11/15/2014
>> > > - Hadoop 2.6.5: Last commit on 10/11/2016
>> > > - Hadoop 2.7.7: Last commit on 7/18/2018
>> > > - Hadoop 2.8.5: Last commit on 9/8/2018
>> > >
>> > > Since then there were two more minor releases in 2.x branch and four
>> more
>> > > minor releases in 3.x branch.
>> > >
>> > > Supporting the older version involves reflection-based "hacks" for
>> > > supporting multiple versions.
>> > >
>> > > My proposal would be changing the minimum supported version *to
>> 2.8.5*.
>> > > This should simplify the hadoop related codebase and simplify the CI
>> > build
>> > > infrastructure as we won't have to test for the older versions.
>> > >
>> > > Please note that this only involves a minimal *client side*
>> > compatibility.
>> > > The wire protocol should remain compatible with earlier versions [2],
>> so
>> > we
>> > > should be able to talk with any servers in 2.x major branch.
>> > >
>> > > One small note for the 2.8.x branch, some of the classes we need are
>> only
>> > > available in 2.8.4 version and above, but I'm not sure we should take
>> an
>> > > eventual need for upgrading a patch version into consideration here,
>> > > because both 2.8.4 and 2.8.5 are pretty old.
>> > >
>> > > WDYT, is it already time to upgrade? Looking forward for any thoughts
>> on
>> > > the topic!
>> > >
>> > > [1]
>> > >
>> > >
>> >
>> https://github.com/apache/flink/blob/release-1.14.0/tools/azure-pipelines/build-apache-repo.yml#L123
>> > > [2]
>> > >
>> > >
>> >
>> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility
>> > >
>> > > Best,
>> > > D.
>> > >
>> >
>>
>

Re: [DISCUSS] Changing the minimal supported version of Hadoop

Posted by David Morávek <dm...@apache.org>.
CC user@f.a.o

Is anyone aware of something that blocks us from doing the upgrade?

D.

On Tue, Dec 21, 2021 at 5:50 PM David Morávek <da...@gmail.com>
wrote:

> Hi Martijn,
>
> from person experience, most Hadoop users are lagging behind the release
> lines by a lot, because upgrading a Hadoop cluster is not really a simply
> task to achieve. I think for now, we can stay a bit conservative, nothing
> blocks us for using 2.8.5 as we don't use any "newer" APIs in the code.
>
> As for Till's concern, we can still wrap the reflection based logic, to be
> skipped in case of "NoClassDefFound" instead of "ClassNotFound" as we do
> now.
>
> D.
>
>
> On Tue, Dec 14, 2021 at 5:23 PM Martijn Visser <ma...@ververica.com>
> wrote:
>
>> Hi David,
>>
>> Thanks for bringing this up for discussion! Given that Hadoop 2.8 is
>> considered EOL, shouldn't we bump the version to Hadoop 2.10? [1]
>>
>> Best regards,
>>
>> Martijn
>>
>> [1]
>>
>> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Active+Release+Lines
>>
>> On Tue, 14 Dec 2021 at 10:28, Till Rohrmann <tr...@apache.org> wrote:
>>
>> > Hi David,
>> >
>> > I think we haven't updated our Hadoop dependencies in a long time.
>> Hence,
>> > it is probably time to do so. So +1 for upgrading to the latest patch
>> > release.
>> >
>> > If newer 2.x Hadoop versions are compatible with 2.y with x >= y, then I
>> > don't see a problem with dropping support for pre-bundled Hadoop
>> versions <
>> > 2.8. This could indeed help us decrease our build matrix a bit and,
>> thus,
>> > saving some build time.
>> >
>> > Concerning simplifying our code base to get rid of reflection logic
>> etc. we
>> > still might have to add a safeguard for features that are not supported
>> by
>> > earlier versions. According to the docs
>> >
>> > > YARN applications that attempt to use new APIs (including new fields
>> in
>> > data structures) that have not yet been deployed to the cluster can
>> expect
>> > link exceptions
>> >
>> > we can see link exceptions. We could get around this by saying that
>> Flink
>> > no longer supports Hadoop < 2.8. But this should be checked with our
>> users
>> > on the user ML at least.
>> >
>> > Cheers,
>> > Till
>> >
>> > On Tue, Dec 14, 2021 at 9:25 AM David Morávek <dm...@apache.org> wrote:
>> >
>> > > Hi,
>> > >
>> > > I'd like to start a discussion about upgrading a minimal Hadoop
>> version
>> > > that Flink supports.
>> > >
>> > > Even though the default value for `hadoop.version` property is set to
>> > > 2.8.3, we're still ensuring both runtime and compile compatibility
>> with
>> > > Hadoop 2.4.x with the scheduled pipeline[1].
>> > >
>> > > Here is list of dates of the latest releases for each minor version
>> up to
>> > > 2.8.x
>> > >
>> > > - Hadoop 2.4.1: Last commit on 6/30/2014
>> > > - Hadoop 2.5.2: Last commit on 11/15/2014
>> > > - Hadoop 2.6.5: Last commit on 10/11/2016
>> > > - Hadoop 2.7.7: Last commit on 7/18/2018
>> > > - Hadoop 2.8.5: Last commit on 9/8/2018
>> > >
>> > > Since then there were two more minor releases in 2.x branch and four
>> more
>> > > minor releases in 3.x branch.
>> > >
>> > > Supporting the older version involves reflection-based "hacks" for
>> > > supporting multiple versions.
>> > >
>> > > My proposal would be changing the minimum supported version *to
>> 2.8.5*.
>> > > This should simplify the hadoop related codebase and simplify the CI
>> > build
>> > > infrastructure as we won't have to test for the older versions.
>> > >
>> > > Please note that this only involves a minimal *client side*
>> > compatibility.
>> > > The wire protocol should remain compatible with earlier versions [2],
>> so
>> > we
>> > > should be able to talk with any servers in 2.x major branch.
>> > >
>> > > One small note for the 2.8.x branch, some of the classes we need are
>> only
>> > > available in 2.8.4 version and above, but I'm not sure we should take
>> an
>> > > eventual need for upgrading a patch version into consideration here,
>> > > because both 2.8.4 and 2.8.5 are pretty old.
>> > >
>> > > WDYT, is it already time to upgrade? Looking forward for any thoughts
>> on
>> > > the topic!
>> > >
>> > > [1]
>> > >
>> > >
>> >
>> https://github.com/apache/flink/blob/release-1.14.0/tools/azure-pipelines/build-apache-repo.yml#L123
>> > > [2]
>> > >
>> > >
>> >
>> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility
>> > >
>> > > Best,
>> > > D.
>> > >
>> >
>>
>

Re: [DISCUSS] Changing the minimal supported version of Hadoop

Posted by David Morávek <da...@gmail.com>.
Hi Martijn,

from person experience, most Hadoop users are lagging behind the release
lines by a lot, because upgrading a Hadoop cluster is not really a simply
task to achieve. I think for now, we can stay a bit conservative, nothing
blocks us for using 2.8.5 as we don't use any "newer" APIs in the code.

As for Till's concern, we can still wrap the reflection based logic, to be
skipped in case of "NoClassDefFound" instead of "ClassNotFound" as we do
now.

D.


On Tue, Dec 14, 2021 at 5:23 PM Martijn Visser <ma...@ververica.com>
wrote:

> Hi David,
>
> Thanks for bringing this up for discussion! Given that Hadoop 2.8 is
> considered EOL, shouldn't we bump the version to Hadoop 2.10? [1]
>
> Best regards,
>
> Martijn
>
> [1]
>
> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Active+Release+Lines
>
> On Tue, 14 Dec 2021 at 10:28, Till Rohrmann <tr...@apache.org> wrote:
>
> > Hi David,
> >
> > I think we haven't updated our Hadoop dependencies in a long time. Hence,
> > it is probably time to do so. So +1 for upgrading to the latest patch
> > release.
> >
> > If newer 2.x Hadoop versions are compatible with 2.y with x >= y, then I
> > don't see a problem with dropping support for pre-bundled Hadoop
> versions <
> > 2.8. This could indeed help us decrease our build matrix a bit and, thus,
> > saving some build time.
> >
> > Concerning simplifying our code base to get rid of reflection logic etc.
> we
> > still might have to add a safeguard for features that are not supported
> by
> > earlier versions. According to the docs
> >
> > > YARN applications that attempt to use new APIs (including new fields in
> > data structures) that have not yet been deployed to the cluster can
> expect
> > link exceptions
> >
> > we can see link exceptions. We could get around this by saying that Flink
> > no longer supports Hadoop < 2.8. But this should be checked with our
> users
> > on the user ML at least.
> >
> > Cheers,
> > Till
> >
> > On Tue, Dec 14, 2021 at 9:25 AM David Morávek <dm...@apache.org> wrote:
> >
> > > Hi,
> > >
> > > I'd like to start a discussion about upgrading a minimal Hadoop version
> > > that Flink supports.
> > >
> > > Even though the default value for `hadoop.version` property is set to
> > > 2.8.3, we're still ensuring both runtime and compile compatibility with
> > > Hadoop 2.4.x with the scheduled pipeline[1].
> > >
> > > Here is list of dates of the latest releases for each minor version up
> to
> > > 2.8.x
> > >
> > > - Hadoop 2.4.1: Last commit on 6/30/2014
> > > - Hadoop 2.5.2: Last commit on 11/15/2014
> > > - Hadoop 2.6.5: Last commit on 10/11/2016
> > > - Hadoop 2.7.7: Last commit on 7/18/2018
> > > - Hadoop 2.8.5: Last commit on 9/8/2018
> > >
> > > Since then there were two more minor releases in 2.x branch and four
> more
> > > minor releases in 3.x branch.
> > >
> > > Supporting the older version involves reflection-based "hacks" for
> > > supporting multiple versions.
> > >
> > > My proposal would be changing the minimum supported version *to 2.8.5*.
> > > This should simplify the hadoop related codebase and simplify the CI
> > build
> > > infrastructure as we won't have to test for the older versions.
> > >
> > > Please note that this only involves a minimal *client side*
> > compatibility.
> > > The wire protocol should remain compatible with earlier versions [2],
> so
> > we
> > > should be able to talk with any servers in 2.x major branch.
> > >
> > > One small note for the 2.8.x branch, some of the classes we need are
> only
> > > available in 2.8.4 version and above, but I'm not sure we should take
> an
> > > eventual need for upgrading a patch version into consideration here,
> > > because both 2.8.4 and 2.8.5 are pretty old.
> > >
> > > WDYT, is it already time to upgrade? Looking forward for any thoughts
> on
> > > the topic!
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/flink/blob/release-1.14.0/tools/azure-pipelines/build-apache-repo.yml#L123
> > > [2]
> > >
> > >
> >
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility
> > >
> > > Best,
> > > D.
> > >
> >
>

Re: [DISCUSS] Changing the minimal supported version of Hadoop

Posted by Martijn Visser <ma...@ververica.com>.
Hi David,

Thanks for bringing this up for discussion! Given that Hadoop 2.8 is
considered EOL, shouldn't we bump the version to Hadoop 2.10? [1]

Best regards,

Martijn

[1]
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Active+Release+Lines

On Tue, 14 Dec 2021 at 10:28, Till Rohrmann <tr...@apache.org> wrote:

> Hi David,
>
> I think we haven't updated our Hadoop dependencies in a long time. Hence,
> it is probably time to do so. So +1 for upgrading to the latest patch
> release.
>
> If newer 2.x Hadoop versions are compatible with 2.y with x >= y, then I
> don't see a problem with dropping support for pre-bundled Hadoop versions <
> 2.8. This could indeed help us decrease our build matrix a bit and, thus,
> saving some build time.
>
> Concerning simplifying our code base to get rid of reflection logic etc. we
> still might have to add a safeguard for features that are not supported by
> earlier versions. According to the docs
>
> > YARN applications that attempt to use new APIs (including new fields in
> data structures) that have not yet been deployed to the cluster can expect
> link exceptions
>
> we can see link exceptions. We could get around this by saying that Flink
> no longer supports Hadoop < 2.8. But this should be checked with our users
> on the user ML at least.
>
> Cheers,
> Till
>
> On Tue, Dec 14, 2021 at 9:25 AM David Morávek <dm...@apache.org> wrote:
>
> > Hi,
> >
> > I'd like to start a discussion about upgrading a minimal Hadoop version
> > that Flink supports.
> >
> > Even though the default value for `hadoop.version` property is set to
> > 2.8.3, we're still ensuring both runtime and compile compatibility with
> > Hadoop 2.4.x with the scheduled pipeline[1].
> >
> > Here is list of dates of the latest releases for each minor version up to
> > 2.8.x
> >
> > - Hadoop 2.4.1: Last commit on 6/30/2014
> > - Hadoop 2.5.2: Last commit on 11/15/2014
> > - Hadoop 2.6.5: Last commit on 10/11/2016
> > - Hadoop 2.7.7: Last commit on 7/18/2018
> > - Hadoop 2.8.5: Last commit on 9/8/2018
> >
> > Since then there were two more minor releases in 2.x branch and four more
> > minor releases in 3.x branch.
> >
> > Supporting the older version involves reflection-based "hacks" for
> > supporting multiple versions.
> >
> > My proposal would be changing the minimum supported version *to 2.8.5*.
> > This should simplify the hadoop related codebase and simplify the CI
> build
> > infrastructure as we won't have to test for the older versions.
> >
> > Please note that this only involves a minimal *client side*
> compatibility.
> > The wire protocol should remain compatible with earlier versions [2], so
> we
> > should be able to talk with any servers in 2.x major branch.
> >
> > One small note for the 2.8.x branch, some of the classes we need are only
> > available in 2.8.4 version and above, but I'm not sure we should take an
> > eventual need for upgrading a patch version into consideration here,
> > because both 2.8.4 and 2.8.5 are pretty old.
> >
> > WDYT, is it already time to upgrade? Looking forward for any thoughts on
> > the topic!
> >
> > [1]
> >
> >
> https://github.com/apache/flink/blob/release-1.14.0/tools/azure-pipelines/build-apache-repo.yml#L123
> > [2]
> >
> >
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility
> >
> > Best,
> > D.
> >
>

Re: [DISCUSS] Changing the minimal supported version of Hadoop

Posted by Till Rohrmann <tr...@apache.org>.
Hi David,

I think we haven't updated our Hadoop dependencies in a long time. Hence,
it is probably time to do so. So +1 for upgrading to the latest patch
release.

If newer 2.x Hadoop versions are compatible with 2.y with x >= y, then I
don't see a problem with dropping support for pre-bundled Hadoop versions <
2.8. This could indeed help us decrease our build matrix a bit and, thus,
saving some build time.

Concerning simplifying our code base to get rid of reflection logic etc. we
still might have to add a safeguard for features that are not supported by
earlier versions. According to the docs

> YARN applications that attempt to use new APIs (including new fields in
data structures) that have not yet been deployed to the cluster can expect
link exceptions

we can see link exceptions. We could get around this by saying that Flink
no longer supports Hadoop < 2.8. But this should be checked with our users
on the user ML at least.

Cheers,
Till

On Tue, Dec 14, 2021 at 9:25 AM David Morávek <dm...@apache.org> wrote:

> Hi,
>
> I'd like to start a discussion about upgrading a minimal Hadoop version
> that Flink supports.
>
> Even though the default value for `hadoop.version` property is set to
> 2.8.3, we're still ensuring both runtime and compile compatibility with
> Hadoop 2.4.x with the scheduled pipeline[1].
>
> Here is list of dates of the latest releases for each minor version up to
> 2.8.x
>
> - Hadoop 2.4.1: Last commit on 6/30/2014
> - Hadoop 2.5.2: Last commit on 11/15/2014
> - Hadoop 2.6.5: Last commit on 10/11/2016
> - Hadoop 2.7.7: Last commit on 7/18/2018
> - Hadoop 2.8.5: Last commit on 9/8/2018
>
> Since then there were two more minor releases in 2.x branch and four more
> minor releases in 3.x branch.
>
> Supporting the older version involves reflection-based "hacks" for
> supporting multiple versions.
>
> My proposal would be changing the minimum supported version *to 2.8.5*.
> This should simplify the hadoop related codebase and simplify the CI build
> infrastructure as we won't have to test for the older versions.
>
> Please note that this only involves a minimal *client side* compatibility.
> The wire protocol should remain compatible with earlier versions [2], so we
> should be able to talk with any servers in 2.x major branch.
>
> One small note for the 2.8.x branch, some of the classes we need are only
> available in 2.8.4 version and above, but I'm not sure we should take an
> eventual need for upgrading a patch version into consideration here,
> because both 2.8.4 and 2.8.5 are pretty old.
>
> WDYT, is it already time to upgrade? Looking forward for any thoughts on
> the topic!
>
> [1]
>
> https://github.com/apache/flink/blob/release-1.14.0/tools/azure-pipelines/build-apache-repo.yml#L123
> [2]
>
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility
>
> Best,
> D.
>