You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Erik Erlandson <ee...@redhat.com> on 2018/10/16 15:20:04 UTC

[DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

I'd like to propose including integration testing for Kerberos on the Spark
2.4 release:
https://github.com/apache/spark/pull/22608

Arguments in favor:
1) it improves testing coverage on a feature important for integrating with
HDFS deployments
2) its intersection with existing code is small - it consists primarily of
new testing code, with a bit of refactoring into 'main' and 'test'
sub-trees. These new tests appear stable.
3) Spark 2.4 is still in RC, with outstanding correctness issues.

The argument 'against' that I'm aware of would be the relatively large size
of the PR. I believe this is considered above, but am soliciting community
feedback before committing.
Cheers,
Erik

Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

Posted by Dongjoon Hyun <do...@gmail.com>.
I also agree with Reynold and Xiao.

Although I love that new feature, Spark 2.4 branch-cut was made a long time
ago.

We cannot backport new features at this stage at RC4.

In addition, could you split Apache SPARK issue IDs, Ilan? It's confusing
during discussion.

     (1) [SPARK-23257][K8S] Kerberos Support for Spark on K8S (merged
yesterday for Apache Spark 3.0)
     (2) [SPARK-23257][K8S][TESTS] Kerberos Support Integration Tests (a
live PR with about *2000 lines. It's not a follow-up size.*)

For (1), it's merged yesterday. That means more people start to try (1)
from today. We need more time to stabilize it.
For (2), it's still under review.

Both (1) and (2) looks only valid for Spark 3.0.0.

Bests,
Dongjoon.



On Tue, Oct 16, 2018 at 1:32 PM Xiao Li <ga...@gmail.com> wrote:

> We need to strictly follow the backport and release policy. We can't merge
> such a new feature into a RC branch or a minor release (e.g., 2.4.1).
>
> Cheers,
>
> Xiao
>
> Bolke de Bruin <bd...@gmail.com> 于2018年10月16日周二 下午12:48写道:
>
>> Chiming in here. We are in the same boat as Bloomberg.
>>
>> (But being a release manager often myself I understand the trade-off)
>>
>> B.
>>
>> Op di 16 okt. 2018 21:24 schreef Ilan Filonenko <if...@cornell.edu>:
>>
>>> On Erik's note, would SPARK-23257 be included in, say, 2.4.1? When would
>>> the next RC be? I would like to propose the inclusion of the Kerberos
>>> feature sooner rather than later as it would increase Spark-on-K8S adoption
>>> in production workloads while bringing greater feature parity with Yarn and
>>> Mesos. I would like to note that the feature itself is isolated from Core
>>> and isolated via the step-based architecture of the Kubernetes
>>> Driver/Executor builders.
>>>
>>> Furthermore, Spark users traditionally use HDFS for storage and in
>>> production use-cases these HDFS clusters would be kerberized. At Bloomberg,
>>> for example, all of the HDFS clusters are kerberized and for this reason,
>>> the only thing stopping our internal Data Science Platform from adopting
>>> Spark-on-K8S is this feature.
>>>
>>> On Tue, Oct 16, 2018 at 10:21 AM Erik Erlandson <ee...@redhat.com>
>>> wrote:
>>>
>>>>
>>>> SPARK-23257 merged more recently than I realized. If that isn't on
>>>> branch-2.4 then the first question is how soon on the release sequence that
>>>> can be adopted
>>>>
>>>> On Tue, Oct 16, 2018 at 9:33 AM Reynold Xin <rx...@databricks.com>
>>>> wrote:
>>>>
>>>>> We shouldn’t merge new features into release branches anymore.
>>>>>
>>>>> On Tue, Oct 16, 2018 at 6:32 PM Rob Vesse <rv...@dotnetrdf.org>
>>>>> wrote:
>>>>>
>>>>>> Right now the Kerberos support for Spark on K8S is only on master
>>>>>> AFAICT i.e. the feature is not present on branch-2.4
>>>>>>
>>>>>>
>>>>>>
>>>>>> Therefore I don’t see any point in adding the tests into branch-2.4
>>>>>> unless the plan is to also merge the Kerberos support to branch-2.4
>>>>>>
>>>>>>
>>>>>>
>>>>>> Rob
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From: *Erik Erlandson <ee...@redhat.com>
>>>>>> *Date: *Tuesday, 16 October 2018 at 16:47
>>>>>> *To: *dev <de...@spark.apache.org>
>>>>>> *Subject: *[DISCUSS][K8S][TESTS] Include Kerberos integration tests
>>>>>> for Spark 2.4
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'd like to propose including integration testing for Kerberos on the
>>>>>> Spark 2.4 release:
>>>>>>
>>>>>> https://github.com/apache/spark/pull/22608
>>>>>>
>>>>>>
>>>>>>
>>>>>> Arguments in favor:
>>>>>>
>>>>>> 1) it improves testing coverage on a feature important for
>>>>>> integrating with HDFS deployments
>>>>>>
>>>>>> 2) its intersection with existing code is small - it consists
>>>>>> primarily of new testing code, with a bit of refactoring into 'main' and
>>>>>> 'test' sub-trees. These new tests appear stable.
>>>>>>
>>>>>> 3) Spark 2.4 is still in RC, with outstanding correctness issues.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The argument 'against' that I'm aware of would be the relatively
>>>>>> large size of the PR. I believe this is considered above, but am soliciting
>>>>>> community feedback before committing.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Erik
>>>>>>
>>>>>>
>>>>>>
>>>>>

Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

Posted by Xiao Li <ga...@gmail.com>.
We need to strictly follow the backport and release policy. We can't merge
such a new feature into a RC branch or a minor release (e.g., 2.4.1).

Cheers,

Xiao

Bolke de Bruin <bd...@gmail.com> 于2018年10月16日周二 下午12:48写道:

> Chiming in here. We are in the same boat as Bloomberg.
>
> (But being a release manager often myself I understand the trade-off)
>
> B.
>
> Op di 16 okt. 2018 21:24 schreef Ilan Filonenko <if...@cornell.edu>:
>
>> On Erik's note, would SPARK-23257 be included in, say, 2.4.1? When would
>> the next RC be? I would like to propose the inclusion of the Kerberos
>> feature sooner rather than later as it would increase Spark-on-K8S adoption
>> in production workloads while bringing greater feature parity with Yarn and
>> Mesos. I would like to note that the feature itself is isolated from Core
>> and isolated via the step-based architecture of the Kubernetes
>> Driver/Executor builders.
>>
>> Furthermore, Spark users traditionally use HDFS for storage and in
>> production use-cases these HDFS clusters would be kerberized. At Bloomberg,
>> for example, all of the HDFS clusters are kerberized and for this reason,
>> the only thing stopping our internal Data Science Platform from adopting
>> Spark-on-K8S is this feature.
>>
>> On Tue, Oct 16, 2018 at 10:21 AM Erik Erlandson <ee...@redhat.com>
>> wrote:
>>
>>>
>>> SPARK-23257 merged more recently than I realized. If that isn't on
>>> branch-2.4 then the first question is how soon on the release sequence that
>>> can be adopted
>>>
>>> On Tue, Oct 16, 2018 at 9:33 AM Reynold Xin <rx...@databricks.com> wrote:
>>>
>>>> We shouldn’t merge new features into release branches anymore.
>>>>
>>>> On Tue, Oct 16, 2018 at 6:32 PM Rob Vesse <rv...@dotnetrdf.org> wrote:
>>>>
>>>>> Right now the Kerberos support for Spark on K8S is only on master
>>>>> AFAICT i.e. the feature is not present on branch-2.4
>>>>>
>>>>>
>>>>>
>>>>> Therefore I don’t see any point in adding the tests into branch-2.4
>>>>> unless the plan is to also merge the Kerberos support to branch-2.4
>>>>>
>>>>>
>>>>>
>>>>> Rob
>>>>>
>>>>>
>>>>>
>>>>> *From: *Erik Erlandson <ee...@redhat.com>
>>>>> *Date: *Tuesday, 16 October 2018 at 16:47
>>>>> *To: *dev <de...@spark.apache.org>
>>>>> *Subject: *[DISCUSS][K8S][TESTS] Include Kerberos integration tests
>>>>> for Spark 2.4
>>>>>
>>>>>
>>>>>
>>>>> I'd like to propose including integration testing for Kerberos on the
>>>>> Spark 2.4 release:
>>>>>
>>>>> https://github.com/apache/spark/pull/22608
>>>>>
>>>>>
>>>>>
>>>>> Arguments in favor:
>>>>>
>>>>> 1) it improves testing coverage on a feature important for integrating
>>>>> with HDFS deployments
>>>>>
>>>>> 2) its intersection with existing code is small - it consists
>>>>> primarily of new testing code, with a bit of refactoring into 'main' and
>>>>> 'test' sub-trees. These new tests appear stable.
>>>>>
>>>>> 3) Spark 2.4 is still in RC, with outstanding correctness issues.
>>>>>
>>>>>
>>>>>
>>>>> The argument 'against' that I'm aware of would be the relatively large
>>>>> size of the PR. I believe this is considered above, but am soliciting
>>>>> community feedback before committing.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Erik
>>>>>
>>>>>
>>>>>
>>>>

Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

Posted by Bolke de Bruin <bd...@gmail.com>.
Chiming in here. We are in the same boat as Bloomberg.

(But being a release manager often myself I understand the trade-off)

B.

Op di 16 okt. 2018 21:24 schreef Ilan Filonenko <if...@cornell.edu>:

> On Erik's note, would SPARK-23257 be included in, say, 2.4.1? When would
> the next RC be? I would like to propose the inclusion of the Kerberos
> feature sooner rather than later as it would increase Spark-on-K8S adoption
> in production workloads while bringing greater feature parity with Yarn and
> Mesos. I would like to note that the feature itself is isolated from Core
> and isolated via the step-based architecture of the Kubernetes
> Driver/Executor builders.
>
> Furthermore, Spark users traditionally use HDFS for storage and in
> production use-cases these HDFS clusters would be kerberized. At Bloomberg,
> for example, all of the HDFS clusters are kerberized and for this reason,
> the only thing stopping our internal Data Science Platform from adopting
> Spark-on-K8S is this feature.
>
> On Tue, Oct 16, 2018 at 10:21 AM Erik Erlandson <ee...@redhat.com>
> wrote:
>
>>
>> SPARK-23257 merged more recently than I realized. If that isn't on
>> branch-2.4 then the first question is how soon on the release sequence that
>> can be adopted
>>
>> On Tue, Oct 16, 2018 at 9:33 AM Reynold Xin <rx...@databricks.com> wrote:
>>
>>> We shouldn’t merge new features into release branches anymore.
>>>
>>> On Tue, Oct 16, 2018 at 6:32 PM Rob Vesse <rv...@dotnetrdf.org> wrote:
>>>
>>>> Right now the Kerberos support for Spark on K8S is only on master
>>>> AFAICT i.e. the feature is not present on branch-2.4
>>>>
>>>>
>>>>
>>>> Therefore I don’t see any point in adding the tests into branch-2.4
>>>> unless the plan is to also merge the Kerberos support to branch-2.4
>>>>
>>>>
>>>>
>>>> Rob
>>>>
>>>>
>>>>
>>>> *From: *Erik Erlandson <ee...@redhat.com>
>>>> *Date: *Tuesday, 16 October 2018 at 16:47
>>>> *To: *dev <de...@spark.apache.org>
>>>> *Subject: *[DISCUSS][K8S][TESTS] Include Kerberos integration tests
>>>> for Spark 2.4
>>>>
>>>>
>>>>
>>>> I'd like to propose including integration testing for Kerberos on the
>>>> Spark 2.4 release:
>>>>
>>>> https://github.com/apache/spark/pull/22608
>>>>
>>>>
>>>>
>>>> Arguments in favor:
>>>>
>>>> 1) it improves testing coverage on a feature important for integrating
>>>> with HDFS deployments
>>>>
>>>> 2) its intersection with existing code is small - it consists primarily
>>>> of new testing code, with a bit of refactoring into 'main' and 'test'
>>>> sub-trees. These new tests appear stable.
>>>>
>>>> 3) Spark 2.4 is still in RC, with outstanding correctness issues.
>>>>
>>>>
>>>>
>>>> The argument 'against' that I'm aware of would be the relatively large
>>>> size of the PR. I believe this is considered above, but am soliciting
>>>> community feedback before committing.
>>>>
>>>> Cheers,
>>>>
>>>> Erik
>>>>
>>>>
>>>>
>>>

Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

Posted by Ilan Filonenko <if...@cornell.edu>.
On Erik's note, would SPARK-23257 be included in, say, 2.4.1? When would
the next RC be? I would like to propose the inclusion of the Kerberos
feature sooner rather than later as it would increase Spark-on-K8S adoption
in production workloads while bringing greater feature parity with Yarn and
Mesos. I would like to note that the feature itself is isolated from Core
and isolated via the step-based architecture of the Kubernetes
Driver/Executor builders.

Furthermore, Spark users traditionally use HDFS for storage and in
production use-cases these HDFS clusters would be kerberized. At Bloomberg,
for example, all of the HDFS clusters are kerberized and for this reason,
the only thing stopping our internal Data Science Platform from adopting
Spark-on-K8S is this feature.

On Tue, Oct 16, 2018 at 10:21 AM Erik Erlandson <ee...@redhat.com> wrote:

>
> SPARK-23257 merged more recently than I realized. If that isn't on
> branch-2.4 then the first question is how soon on the release sequence that
> can be adopted
>
> On Tue, Oct 16, 2018 at 9:33 AM Reynold Xin <rx...@databricks.com> wrote:
>
>> We shouldn’t merge new features into release branches anymore.
>>
>> On Tue, Oct 16, 2018 at 6:32 PM Rob Vesse <rv...@dotnetrdf.org> wrote:
>>
>>> Right now the Kerberos support for Spark on K8S is only on master AFAICT
>>> i.e. the feature is not present on branch-2.4
>>>
>>>
>>>
>>> Therefore I don’t see any point in adding the tests into branch-2.4
>>> unless the plan is to also merge the Kerberos support to branch-2.4
>>>
>>>
>>>
>>> Rob
>>>
>>>
>>>
>>> *From: *Erik Erlandson <ee...@redhat.com>
>>> *Date: *Tuesday, 16 October 2018 at 16:47
>>> *To: *dev <de...@spark.apache.org>
>>> *Subject: *[DISCUSS][K8S][TESTS] Include Kerberos integration tests for
>>> Spark 2.4
>>>
>>>
>>>
>>> I'd like to propose including integration testing for Kerberos on the
>>> Spark 2.4 release:
>>>
>>> https://github.com/apache/spark/pull/22608
>>>
>>>
>>>
>>> Arguments in favor:
>>>
>>> 1) it improves testing coverage on a feature important for integrating
>>> with HDFS deployments
>>>
>>> 2) its intersection with existing code is small - it consists primarily
>>> of new testing code, with a bit of refactoring into 'main' and 'test'
>>> sub-trees. These new tests appear stable.
>>>
>>> 3) Spark 2.4 is still in RC, with outstanding correctness issues.
>>>
>>>
>>>
>>> The argument 'against' that I'm aware of would be the relatively large
>>> size of the PR. I believe this is considered above, but am soliciting
>>> community feedback before committing.
>>>
>>> Cheers,
>>>
>>> Erik
>>>
>>>
>>>
>>

Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

Posted by Erik Erlandson <ee...@redhat.com>.
SPARK-23257 merged more recently than I realized. If that isn't on
branch-2.4 then the first question is how soon on the release sequence that
can be adopted

On Tue, Oct 16, 2018 at 9:33 AM Reynold Xin <rx...@databricks.com> wrote:

> We shouldn’t merge new features into release branches anymore.
>
> On Tue, Oct 16, 2018 at 6:32 PM Rob Vesse <rv...@dotnetrdf.org> wrote:
>
>> Right now the Kerberos support for Spark on K8S is only on master AFAICT
>> i.e. the feature is not present on branch-2.4
>>
>>
>>
>> Therefore I don’t see any point in adding the tests into branch-2.4
>> unless the plan is to also merge the Kerberos support to branch-2.4
>>
>>
>>
>> Rob
>>
>>
>>
>> *From: *Erik Erlandson <ee...@redhat.com>
>> *Date: *Tuesday, 16 October 2018 at 16:47
>> *To: *dev <de...@spark.apache.org>
>> *Subject: *[DISCUSS][K8S][TESTS] Include Kerberos integration tests for
>> Spark 2.4
>>
>>
>>
>> I'd like to propose including integration testing for Kerberos on the
>> Spark 2.4 release:
>>
>> https://github.com/apache/spark/pull/22608
>>
>>
>>
>> Arguments in favor:
>>
>> 1) it improves testing coverage on a feature important for integrating
>> with HDFS deployments
>>
>> 2) its intersection with existing code is small - it consists primarily
>> of new testing code, with a bit of refactoring into 'main' and 'test'
>> sub-trees. These new tests appear stable.
>>
>> 3) Spark 2.4 is still in RC, with outstanding correctness issues.
>>
>>
>>
>> The argument 'against' that I'm aware of would be the relatively large
>> size of the PR. I believe this is considered above, but am soliciting
>> community feedback before committing.
>>
>> Cheers,
>>
>> Erik
>>
>>
>>
>

Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

Posted by Reynold Xin <rx...@databricks.com>.
We shouldn’t merge new features into release branches anymore.

On Tue, Oct 16, 2018 at 6:32 PM Rob Vesse <rv...@dotnetrdf.org> wrote:

> Right now the Kerberos support for Spark on K8S is only on master AFAICT
> i.e. the feature is not present on branch-2.4
>
>
>
> Therefore I don’t see any point in adding the tests into branch-2.4 unless
> the plan is to also merge the Kerberos support to branch-2.4
>
>
>
> Rob
>
>
>
> *From: *Erik Erlandson <ee...@redhat.com>
> *Date: *Tuesday, 16 October 2018 at 16:47
> *To: *dev <de...@spark.apache.org>
> *Subject: *[DISCUSS][K8S][TESTS] Include Kerberos integration tests for
> Spark 2.4
>
>
>
> I'd like to propose including integration testing for Kerberos on the
> Spark 2.4 release:
>
> https://github.com/apache/spark/pull/22608
>
>
>
> Arguments in favor:
>
> 1) it improves testing coverage on a feature important for integrating
> with HDFS deployments
>
> 2) its intersection with existing code is small - it consists primarily of
> new testing code, with a bit of refactoring into 'main' and 'test'
> sub-trees. These new tests appear stable.
>
> 3) Spark 2.4 is still in RC, with outstanding correctness issues.
>
>
>
> The argument 'against' that I'm aware of would be the relatively large
> size of the PR. I believe this is considered above, but am soliciting
> community feedback before committing.
>
> Cheers,
>
> Erik
>
>
>

Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

Posted by Yinan Li <li...@gmail.com>.
Yep, the Kerberos support for k8s is in the master but not in branch-2.4. I
see no reason to get the integration tests into 2.4, which depend on the
feature in the master.

On Tue, Oct 16, 2018 at 9:32 AM Rob Vesse <rv...@dotnetrdf.org> wrote:

> Right now the Kerberos support for Spark on K8S is only on master AFAICT
> i.e. the feature is not present on branch-2.4
>
>
>
> Therefore I don’t see any point in adding the tests into branch-2.4 unless
> the plan is to also merge the Kerberos support to branch-2.4
>
>
>
> Rob
>
>
>
> *From: *Erik Erlandson <ee...@redhat.com>
> *Date: *Tuesday, 16 October 2018 at 16:47
> *To: *dev <de...@spark.apache.org>
> *Subject: *[DISCUSS][K8S][TESTS] Include Kerberos integration tests for
> Spark 2.4
>
>
>
> I'd like to propose including integration testing for Kerberos on the
> Spark 2.4 release:
>
> https://github.com/apache/spark/pull/22608
>
>
>
> Arguments in favor:
>
> 1) it improves testing coverage on a feature important for integrating
> with HDFS deployments
>
> 2) its intersection with existing code is small - it consists primarily of
> new testing code, with a bit of refactoring into 'main' and 'test'
> sub-trees. These new tests appear stable.
>
> 3) Spark 2.4 is still in RC, with outstanding correctness issues.
>
>
>
> The argument 'against' that I'm aware of would be the relatively large
> size of the PR. I believe this is considered above, but am soliciting
> community feedback before committing.
>
> Cheers,
>
> Erik
>
>
>

Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

Posted by Rob Vesse <rv...@dotnetrdf.org>.
Right now the Kerberos support for Spark on K8S is only on master AFAICT i.e. the feature is not present on branch-2.4 

 

Therefore I don’t see any point in adding the tests into branch-2.4 unless the plan is to also merge the Kerberos support to branch-2.4

 

Rob

 

From: Erik Erlandson <ee...@redhat.com>
Date: Tuesday, 16 October 2018 at 16:47
To: dev <de...@spark.apache.org>
Subject: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

 

I'd like to propose including integration testing for Kerberos on the Spark 2.4 release:

https://github.com/apache/spark/pull/22608

 

Arguments in favor:

1) it improves testing coverage on a feature important for integrating with HDFS deployments

2) its intersection with existing code is small - it consists primarily of new testing code, with a bit of refactoring into 'main' and 'test' sub-trees. These new tests appear stable.

3) Spark 2.4 is still in RC, with outstanding correctness issues.

 

The argument 'against' that I'm aware of would be the relatively large size of the PR. I believe this is considered above, but am soliciting community feedback before committing.

Cheers,

Erik

 


Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

Posted by Felix Cheung <fe...@hotmail.com>.
I’m in favor of it. If you check the PR it’s a few isolated script changes and all test-only changes. Should have low impact on release but much better integration test coverage.


________________________________
From: Erik Erlandson <ee...@redhat.com>
Sent: Tuesday, October 16, 2018 8:20 AM
To: dev
Subject: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

I'd like to propose including integration testing for Kerberos on the Spark 2.4 release:
https://github.com/apache/spark/pull/22608

Arguments in favor:
1) it improves testing coverage on a feature important for integrating with HDFS deployments
2) its intersection with existing code is small - it consists primarily of new testing code, with a bit of refactoring into 'main' and 'test' sub-trees. These new tests appear stable.
3) Spark 2.4 is still in RC, with outstanding correctness issues.

The argument 'against' that I'm aware of would be the relatively large size of the PR. I believe this is considered above, but am soliciting community feedback before committing.
Cheers,
Erik