You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ifilonenko <gi...@git.apache.org> on 2018/04/18 05:29:13 UTC

[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

GitHub user ifilonenko opened a pull request:

    https://github.com/apache/spark/pull/21092

    [SPARK-23984][K8S][WIP] Initial Python Bindings for PySpark on K8s

    ## What changes were proposed in this pull request?
    
    Introducing Python Bindings for PySpark.
    
    - [ ] Running PySpark Jobs
    - [ ] Increased Default Memory Overhead value
    - [ ] Dependency Management for virtualenv/conda
    
    ## How was this patch tested?
    
    This patch was tested with 
    
    - [ ] Unit Tests
    - [ ] Integration tests with [this addition](https://github.com/apache-spark-on-k8s/spark-integration/pull/46)
    ```
    KubernetesSuite:
    - Run SparkPi with no resources
    - Run SparkPi with a very long application name.
    - Run SparkPi with a master URL without a scheme.
    - Run SparkPi with an argument.
    - Run SparkPi with custom labels, annotations, and environment variables.
    - Run SparkPi with a test secret mounted into the driver and executor pods
    - Run extraJVMOptions check on driver
    - Run SparkRemoteFileTest using a remote data file
    - Run PySpark on simple pi.py example
    Run completed in 4 minutes, 3 seconds.
    Total number of tests run: 9
    Suites: completed 2, aborted 0
    Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
    All tests passed.
    ```
    
    ## Problematic Comments from [ifilonenko]
    
    - [ ] Currently Docker image is built with Python2 --> needs to be generic for Python2/3
    - [ ] `--py-files` is properly distributing but it seems that example commands like
    ```
    exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner /opt/spark/examples/src/main/python/pi.py /opt/spark/examples/src/main/python/sort.py
    ```
    is causing errors of `/opt/spark/examples/src/main/python/pi.py` thinking that `/opt/spark/examples/src/main/python/sort.py is an argument
    
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ifilonenko/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21092.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21092
    
----
commit fb5b9ed83d4e5ed73bc44b9d719ac0e52702655e
Author: Ilan Filonenko <if...@...>
Date:   2018-04-16T03:23:43Z

    initial architecture for PySpark w/o dockerfile work

commit b7b3db0abfbf425120fa21cc61e603c5d766f8af
Author: Ilan Filonenko <if...@...>
Date:   2018-04-17T19:13:45Z

    included entrypoint logic

commit 98cef8ceb0f04cfcefbc482c2a0fe39c75f620c4
Author: Ilan Filonenko <if...@...>
Date:   2018-04-18T02:22:55Z

    satisfying integration tests

commit dc670dcd07944ae30b9b425c26250a21986b2699
Author: Ilan Filonenko <if...@...>
Date:   2018-04-18T05:20:12Z

    end-to-end working pyspark

commit eabe4b9b784f37cca3dd9bcff17110944b50f5c8
Author: Ilan Filonenko <if...@...>
Date:   2018-04-18T05:20:42Z

    Merge pull request #1 from ifilonenko/py-spark
    
    Initial architecture for PySpark w/o dependency management

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r187646801
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/bindings/PythonDriverFeatureStep.scala ---
    @@ -0,0 +1,75 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.deploy.k8s.features.bindings
    +
    +import scala.collection.JavaConverters._
    +
    +import io.fabric8.kubernetes.api.model.ContainerBuilder
    +import io.fabric8.kubernetes.api.model.EnvVarBuilder
    +import io.fabric8.kubernetes.api.model.HasMetadata
    +
    +import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod}
    +import org.apache.spark.deploy.k8s.Constants._
    +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf
    +import org.apache.spark.deploy.k8s.KubernetesUtils
    +import org.apache.spark.deploy.k8s.features.KubernetesFeatureConfigStep
    +
    +private[spark] class PythonDriverFeatureStep(
    +  kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf])
    +  extends KubernetesFeatureConfigStep {
    +  override def configurePod(pod: SparkPod): SparkPod = {
    +    val roleConf = kubernetesConf.roleSpecificConf
    +    require(roleConf.mainAppResource.isDefined, "PySpark Main Resource must be defined")
    +    val maybePythonArgs = Option(roleConf.appArgs).filter(_.nonEmpty).map(
    +      s =>
    +        new EnvVarBuilder()
    +          .withName(ENV_PYSPARK_ARGS)
    +          .withValue(s.mkString(","))
    +          .build())
    +    val maybePythonFiles = kubernetesConf.pyFiles().map(
    +      pyFiles =>
    +        new EnvVarBuilder()
    +          .withName(ENV_PYSPARK_FILES)
    +          .withValue(KubernetesUtils.resolveFileUrisAndPath(pyFiles.split(","))
    --- End diff --
    
    Maybe add a comment on why we need switch from ","s to ":"s.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test starting
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3694/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3080/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings for PySp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test status success
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2365/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r183161726
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala ---
    @@ -88,15 +94,22 @@ private[spark] class BasicDriverFeatureStep(
             .addToRequests("memory", driverMemoryQuantity)
             .addToLimits("memory", driverMemoryQuantity)
             .endResources()
    -      .addToArgs("driver")
    +      .addToArgs(driverDockerContainer)
           .addToArgs("--properties-file", SPARK_CONF_PATH)
           .addToArgs("--class", conf.roleSpecificConf.mainClass)
    -      // The user application jar is merged into the spark.jars list and managed through that
    -      // property, so there is no need to reference it explicitly here.
    -      .addToArgs(SparkLauncher.NO_RESOURCE)
    -      .addToArgs(conf.roleSpecificConf.appArgs: _*)
    -      .build()
     
    +    val driverContainer =
    +      if (driverDockerContainer == "driver-py") {
    --- End diff --
    
    > So what about applications which need Python support (e.g. have Python UDFS) but don't use a Python driver process?
    
    Think that's up to the user to make it work - I don't see this being specifically handled by the other cluster managers.
    
    The goal of this PR should be to bring Kubernetes up to par with the other cluster managers with respect to what they provide.Do the other cluster managers provide any specific support for this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90424/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by erikerlandson <gi...@git.apache.org>.
Github user erikerlandson commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r182563748
  
    --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh ---
    @@ -62,6 +69,14 @@ case "$SPARK_K8S_CMD" in
           "$@"
         )
         ;;
    +  driver-py)
    +    CMD=(
    +      "$SPARK_HOME/bin/spark-submit"
    +      --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS"
    +      --deploy-mode client
    +      "$@" $PYSPARK_PRIMARY $PYSPARK_SECONDARY
    --- End diff --
    
    +1


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r183162517
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala ---
    @@ -88,15 +94,22 @@ private[spark] class BasicDriverFeatureStep(
             .addToRequests("memory", driverMemoryQuantity)
             .addToLimits("memory", driverMemoryQuantity)
             .endResources()
    -      .addToArgs("driver")
    +      .addToArgs(driverDockerContainer)
           .addToArgs("--properties-file", SPARK_CONF_PATH)
           .addToArgs("--class", conf.roleSpecificConf.mainClass)
    -      // The user application jar is merged into the spark.jars list and managed through that
    -      // property, so there is no need to reference it explicitly here.
    -      .addToArgs(SparkLauncher.NO_RESOURCE)
    -      .addToArgs(conf.roleSpecificConf.appArgs: _*)
    -      .build()
     
    +    val driverContainer =
    +      if (driverDockerContainer == "driver-py") {
    --- End diff --
    
    We currently are only running the Python and future R step when we are leveraging a Python (or R) driver process. Else the user would just specify the spark-py docker-image no? and then just continue to run a non-Python driver process. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r194112328
  
    --- Diff: docs/running-on-kubernetes.md ---
    @@ -624,4 +624,20 @@ specific to Spark on Kubernetes.
        <code>spark.kubernetes.executor.secrets.ENV_VAR=spark-secret:key</code>.
       </td>
     </tr>
    +<tr>
    +  <td><code>spark.kubernetes.memoryOverheadFactor</code></td>
    +  <td><code>0.1</code></td>
    +  <td>
    +    This sets the Memory Overhead Factor that will allocate memory to non-JVM jobs which in the case of JVM tasks will default to 0.10 and 0.40 for non-JVM jobs.
    --- End diff --
    
    I think we can maybe improve this documentation a little bit. It's not so much how much memory is set aside for non-JVM jobs, it's how much memory is set aside for non-JVM memory, including off-heap allocations, non-JVM jobs (like Python or R), and system processes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r183152677
  
    --- Diff: bin/docker-image-tool.sh ---
    @@ -86,7 +94,8 @@ Commands:
       push        Push a pre-built image to a registry. Requires a repository address to be provided.
     
     Options:
    -  -f file     Dockerfile to build. By default builds the Dockerfile shipped with Spark.
    +  -f file     Dockerfile to build for JVM based Jobs. By default builds the Dockerfile shipped with Spark.
    +  -p file     Dockerfile with Python baked in. By default builds the Dockerfile shipped with Spark.
    --- End diff --
    
    One (future concern) is how we would to handle the overlay with both Python and R at the same time.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91374/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r189982571
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala ---
    @@ -54,7 +54,8 @@ private[spark] class BasicExecutorFeatureStep(
     
       private val memoryOverheadMiB = kubernetesConf
         .get(EXECUTOR_MEMORY_OVERHEAD)
    -    .getOrElse(math.max((MEMORY_OVERHEAD_FACTOR * executorMemoryMiB).toInt,
    +    .getOrElse(math.max(
    +      (kubernetesConf.get(MEMORY_OVERHEAD_FACTOR).getOrElse(0.1) * executorMemoryMiB).toInt,
    --- End diff --
    
    Yeah so multiple yes, but since have two instances of this magic 0.1 constant I'd rather have it shared somewhere incase we go to update this in the future and don't get everywhere. Could also be a shared constant if we want instead rather than a getter for memory overhead factor, either way keeps us honest.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by lucashu1 <gi...@git.apache.org>.
Github user lucashu1 commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Sorry in advance if this is the wrong place to be asking this! 
    
    Does this PR mean that we'll be able to create SparkContexts using PySpark's [`SparkSession.Builder`](https://spark.apache.org/docs/preview/api/python/pyspark.sql.html#pyspark.sql.SparkSession.Builder) with `master` set to `k8s://<...>:<...>`, and have the resulting jobs run on spark-on-k8s, instead of on local/standalone? 
    
    E.g.:
    ```
    from pyspark.sql import SparkSession
    spark = SparkSession.builder.master('k8s://https://kubernetes:443').getOrCreate()
    ```
    
    I'm trying to use PySpark in a Jupyter notebook that's running inside a Kubernetes pod, and have it use spark-on-k8s instead of resorting to using `local[*]` as `master`. 
    
    Till now, I've been getting an error saying that:
    
    > Error: Python applications are currently not supported for Kubernetes.
    
    whenever I try to use `k8s://<...>` as `master`.
    
    Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2822/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    @shaneknapp @ssuchter integration tests seem to be failing not due to this PR, but in general. Please investigate, because this PR does pass integration tests + an extra PySpark test.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    **[Test build #90561 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90561/testReport)** for PR 21092 at commit [`72953a3`](https://github.com/apache/spark/commit/72953a3ef42ce0aa0d4b55c0f213198b4b468907).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r182607048
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala ---
    @@ -88,15 +94,22 @@ private[spark] class BasicDriverFeatureStep(
             .addToRequests("memory", driverMemoryQuantity)
             .addToLimits("memory", driverMemoryQuantity)
             .endResources()
    -      .addToArgs("driver")
    +      .addToArgs(driverDockerContainer)
           .addToArgs("--properties-file", SPARK_CONF_PATH)
           .addToArgs("--class", conf.roleSpecificConf.mainClass)
    -      // The user application jar is merged into the spark.jars list and managed through that
    -      // property, so there is no need to reference it explicitly here.
    -      .addToArgs(SparkLauncher.NO_RESOURCE)
    -      .addToArgs(conf.roleSpecificConf.appArgs: _*)
    -      .build()
     
    +    val driverContainer =
    +      if (driverDockerContainer == "driver-py") {
    --- End diff --
    
    I think in general I'd prefer having two separate step types here. They can share some logic in either a utils class or a shared superclass. But you only apply one step type for Java apps vs one step type for Python apps.
    
    Another way is to have the basic driver step only do work that would be strictly agnostic of python vs java, and then have a separate step for either Java or Python; the orchestrator picks which one to invoke based on the app resource type. To do this I think the step's constructor needs to take more than just the `KubernetesConf` as an argument - it needs to take the appropriate specifically-typed `MainAppResource` as an argument in the constructor as well. This breaks the convention that we've set so far but for now that's probably ok, as long as we don't get parameter length blowup as we go forward.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r187645297
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ---
    @@ -101,17 +112,29 @@ private[spark] object KubernetesConf {
           appId: String,
           mainAppResource: Option[MainAppResource],
           mainClass: String,
    -      appArgs: Array[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
    +      appArgs: Array[String],
    +      maybePyFiles: Option[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
         val sparkConfWithMainAppJar = sparkConf.clone()
    +    val additionalFiles = mutable.ArrayBuffer.empty[String]
         mainAppResource.foreach {
    -      case JavaMainAppResource(res) =>
    -        val previousJars = sparkConf
    -          .getOption("spark.jars")
    -          .map(_.split(","))
    -          .getOrElse(Array.empty)
    -        if (!previousJars.contains(res)) {
    -          sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    -        }
    +        case JavaMainAppResource(res) =>
    +          val previousJars = sparkConf
    +            .getOption("spark.jars")
    +            .map(_.split(","))
    +            .getOrElse(Array.empty)
    +          if (!previousJars.contains(res)) {
    +            sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    +          }
    +        case nonJVM: NonJVMResource =>
    --- End diff --
    
    Maybe add a comment since R isn't currently integrated could be a bit difficult to infer?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test starting
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2973/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by foxish <gi...@git.apache.org>.
Github user foxish commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    jenkins, ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r193801841
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ---
    @@ -102,17 +110,30 @@ private[spark] object KubernetesConf {
           appId: String,
           mainAppResource: Option[MainAppResource],
           mainClass: String,
    -      appArgs: Array[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
    +      appArgs: Array[String],
    +      maybePyFiles: Option[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
         val sparkConfWithMainAppJar = sparkConf.clone()
    +    val additionalFiles = mutable.ArrayBuffer.empty[String]
         mainAppResource.foreach {
    -      case JavaMainAppResource(res) =>
    -        val previousJars = sparkConf
    -          .getOption("spark.jars")
    -          .map(_.split(","))
    -          .getOrElse(Array.empty)
    -        if (!previousJars.contains(res)) {
    -          sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    -        }
    +        case JavaMainAppResource(res) =>
    +          val previousJars = sparkConf
    +            .getOption("spark.jars")
    +            .map(_.split(","))
    +            .getOrElse(Array.empty)
    +          if (!previousJars.contains(res)) {
    +            sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    +          }
    +        // The function of this outer match is to account for multiple nonJVM
    +        // bindings that will all have increased MEMORY_OVERHEAD_FACTOR to 0.4
    +        case nonJVM: NonJVMResource =>
    +          nonJVM match {
    +            case PythonMainAppResource(res) =>
    +              additionalFiles += res
    +              maybePyFiles.foreach{maybePyFiles =>
    +                additionalFiles.appendAll(maybePyFiles.split(","))}
    +              sparkConfWithMainAppJar.set(KUBERNETES_PYSPARK_MAIN_APP_RESOURCE, res)
    +          }
    +          sparkConfWithMainAppJar.set(MEMORY_OVERHEAD_FACTOR, 0.4)
    --- End diff --
    
    Users have an ability to overwrite. Hence the change to `setIfMissing()`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    awesome!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    **[Test build #91573 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91573/testReport)** for PR 21092 at commit [`a61d897`](https://github.com/apache/spark/commit/a61d8973a8961fa69e50757f049b04bda292a088).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r190981001
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/bindings/PythonDriverFeatureStep.scala ---
    @@ -0,0 +1,73 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.deploy.k8s.features.bindings
    +
    +import scala.collection.JavaConverters._
    +
    +import io.fabric8.kubernetes.api.model.{ContainerBuilder, EnvVarBuilder, HasMetadata}
    +
    +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesDriverSpecificConf, KubernetesUtils, SparkPod}
    +import org.apache.spark.deploy.k8s.Constants._
    +import org.apache.spark.deploy.k8s.features.KubernetesFeatureConfigStep
    +
    +private[spark] class PythonDriverFeatureStep(
    +  kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf])
    +  extends KubernetesFeatureConfigStep {
    +  override def configurePod(pod: SparkPod): SparkPod = {
    +    val roleConf = kubernetesConf.roleSpecificConf
    +    require(roleConf.mainAppResource.isDefined, "PySpark Main Resource must be defined")
    +    val maybePythonArgs = Option(roleConf.appArgs).filter(_.nonEmpty).map(
    +      s =>
    --- End diff --
    
    Use something more descriptive than `s`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r183155128
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala ---
    @@ -88,15 +94,22 @@ private[spark] class BasicDriverFeatureStep(
             .addToRequests("memory", driverMemoryQuantity)
             .addToLimits("memory", driverMemoryQuantity)
             .endResources()
    -      .addToArgs("driver")
    +      .addToArgs(driverDockerContainer)
           .addToArgs("--properties-file", SPARK_CONF_PATH)
           .addToArgs("--class", conf.roleSpecificConf.mainClass)
    -      // The user application jar is merged into the spark.jars list and managed through that
    -      // property, so there is no need to reference it explicitly here.
    -      .addToArgs(SparkLauncher.NO_RESOURCE)
    -      .addToArgs(conf.roleSpecificConf.appArgs: _*)
    -      .build()
     
    +    val driverContainer =
    +      if (driverDockerContainer == "driver-py") {
    --- End diff --
    
    So what about applications which need Python support (e.g. have Python UDFS) but don't use a Python driver process?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3078/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r192241158
  
    --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh ---
    @@ -53,6 +53,28 @@ if [ -n "$SPARK_MOUNTED_FILES_DIR" ]; then
       cp -R "$SPARK_MOUNTED_FILES_DIR/." .
     fi
     
    +if [ -n "$PYSPARK_FILES" ]; then
    +    PYTHONPATH="$PYTHONPATH:$PYSPARK_FILES"
    +fi
    +
    +PYSPARK_ARGS=""
    +if [ -n "$PYSPARK_APP_ARGS" ]; then
    +    PYSPARK_ARGS="$PYSPARK_APP_ARGS"
    +fi
    +
    +
    +if [ "$PYSPARK_PYTHON_VERSION" == "2" ]; then
    --- End diff --
    
    We set the PySpark version to default to 2 in the configs


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3761/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r192448946
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ---
    @@ -102,17 +110,30 @@ private[spark] object KubernetesConf {
           appId: String,
           mainAppResource: Option[MainAppResource],
           mainClass: String,
    -      appArgs: Array[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
    +      appArgs: Array[String],
    +      maybePyFiles: Option[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
         val sparkConfWithMainAppJar = sparkConf.clone()
    +    val additionalFiles = mutable.ArrayBuffer.empty[String]
         mainAppResource.foreach {
    -      case JavaMainAppResource(res) =>
    -        val previousJars = sparkConf
    -          .getOption("spark.jars")
    -          .map(_.split(","))
    -          .getOrElse(Array.empty)
    -        if (!previousJars.contains(res)) {
    -          sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    -        }
    +        case JavaMainAppResource(res) =>
    +          val previousJars = sparkConf
    +            .getOption("spark.jars")
    +            .map(_.split(","))
    +            .getOrElse(Array.empty)
    +          if (!previousJars.contains(res)) {
    +            sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    +          }
    +        // The function of this outer match is to account for multiple nonJVM
    +        // bindings that will all have increased MEMORY_OVERHEAD_FACTOR to 0.4
    +        case nonJVM: NonJVMResource =>
    +          nonJVM match {
    +            case PythonMainAppResource(res) =>
    +              additionalFiles += res
    +              maybePyFiles.foreach{maybePyFiles =>
    +                additionalFiles.appendAll(maybePyFiles.split(","))}
    +              sparkConfWithMainAppJar.set(KUBERNETES_PYSPARK_MAIN_APP_RESOURCE, res)
    +          }
    +          sparkConfWithMainAppJar.set(MEMORY_OVERHEAD_FACTOR, 0.4)
    --- End diff --
    
    Yup, you can see my statement about not overriding the explicitly user provided value in comment on the 20th ("if the user has specified a different value don't think we should override it").
    
    So this logic, as it stands, is K8s specific and I don't think we we can change how YARN chooses its memory overhead in a minor release, so I'd expect this to remain K8s specific until at least 3.0 when we can evaluate if we want to change this in YARN as well.
    
    The memory overhead configuration notice done in the YARN page right now 
    (see `spark.yarn.am.memoryOverhead` on http://spark.apache.org/docs/latest/running-on-yarn.html ). So I would document this in http://spark.apache.org/docs/latest/running-on-kubernetes.html#spark-properties e.g. `./docs/running-on-kubernetes.md`).
    
    As for intuitive I'd argue that this actually is more intuitive than what we do in YARN, we know that users who run R & Python need more non-JVM heap space and many users don't know to think about this until their job fails. We can take advantage of our knowledge to handle this setting for the user more often. You can see how often this confuses folks on the list, docs, and stack overflow by looking at "memory overhead exceeded" and "Container killed by YARN for exceeding memory limits" and similar.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by kokes <gi...@git.apache.org>.
Github user kokes commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r193805391
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala ---
    @@ -154,6 +176,24 @@ private[spark] object Config extends Logging {
           .checkValue(interval => interval > 0, s"Logging interval must be a positive time value.")
           .createWithDefaultString("1s")
     
    +  val MEMORY_OVERHEAD_FACTOR =
    +    ConfigBuilder("spark.kubernetes.memoryOverheadFactor")
    +      .doc("This sets the Memory Overhead Factor that will allocate memory to non-JVM jobs " +
    +        "which in the case of JVM tasks will default to 0.10 and 0.40 for non-JVM jobs")
    +      .doubleConf
    +      .checkValue(mem_overhead => mem_overhead >= 0 && mem_overhead < 1,
    +        "Ensure that memory overhead is a double between 0 --> 1.0")
    +      .createWithDefault(0.1)
    +
    +  val PYSPARK_MAJOR_PYTHON_VERSION =
    +    ConfigBuilder("spark.kubernetes.pyspark.pythonversion")
    +      .doc("This sets the python version. Either 2 or 3. (Python2 or Python3)")
    +      .stringConf
    +      .checkValue(pv => List("2", "3").contains(pv),
    +        "Ensure that Python Version is either Python2 or Python3")
    +      .createWithDefault("2")
    --- End diff --
    
    There is only ~18 months of support left for Python 2. Python 3 has been around for 10 years and unless there’s a good reason, I think it should be the default. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186244157
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ---
    @@ -101,17 +112,29 @@ private[spark] object KubernetesConf {
           appId: String,
           mainAppResource: Option[MainAppResource],
           mainClass: String,
    -      appArgs: Array[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
    +      appArgs: Array[String],
    +      maybePyFiles: Option[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
         val sparkConfWithMainAppJar = sparkConf.clone()
    +    val additionalFiles = mutable.ArrayBuffer.empty[String]
         mainAppResource.foreach {
    -      case JavaMainAppResource(res) =>
    -        val previousJars = sparkConf
    -          .getOption("spark.jars")
    -          .map(_.split(","))
    -          .getOrElse(Array.empty)
    -        if (!previousJars.contains(res)) {
    -          sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    -        }
    +        case JavaMainAppResource(res) =>
    +          val previousJars = sparkConf
    +            .getOption("spark.jars")
    +            .map(_.split(","))
    +            .getOrElse(Array.empty)
    +          if (!previousJars.contains(res)) {
    +            sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    +          }
    +        case nonJVM: NonJVMResource =>
    --- End diff --
    
    Why can't we just match `PythonMainAppResource` immediately here - why the two layers of matching?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r192270082
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ---
    @@ -102,17 +110,30 @@ private[spark] object KubernetesConf {
           appId: String,
           mainAppResource: Option[MainAppResource],
           mainClass: String,
    -      appArgs: Array[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
    +      appArgs: Array[String],
    +      maybePyFiles: Option[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
         val sparkConfWithMainAppJar = sparkConf.clone()
    +    val additionalFiles = mutable.ArrayBuffer.empty[String]
         mainAppResource.foreach {
    -      case JavaMainAppResource(res) =>
    -        val previousJars = sparkConf
    -          .getOption("spark.jars")
    -          .map(_.split(","))
    -          .getOrElse(Array.empty)
    -        if (!previousJars.contains(res)) {
    -          sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    -        }
    +        case JavaMainAppResource(res) =>
    +          val previousJars = sparkConf
    +            .getOption("spark.jars")
    +            .map(_.split(","))
    +            .getOrElse(Array.empty)
    +          if (!previousJars.contains(res)) {
    +            sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    +          }
    +        // The function of this outer match is to account for multiple nonJVM
    +        // bindings that will all have increased MEMORY_OVERHEAD_FACTOR to 0.4
    +        case nonJVM: NonJVMResource =>
    +          nonJVM match {
    +            case PythonMainAppResource(res) =>
    +              additionalFiles += res
    +              maybePyFiles.foreach{maybePyFiles =>
    +                additionalFiles.appendAll(maybePyFiles.split(","))}
    +              sparkConfWithMainAppJar.set(KUBERNETES_PYSPARK_MAIN_APP_RESOURCE, res)
    +          }
    +          sparkConfWithMainAppJar.set(MEMORY_OVERHEAD_FACTOR, 0.4)
    --- End diff --
    
    @holdenk this behavior isn't intuitive, that the memory overhead factor default will be calculated differently depending on what language binding the job is running with. Is there a good page in Spark's configuration documentation on https://spark.apache.org/docs/latest/ where this should be documented? Is this logic Kubernetes specific?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test starting
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2753/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2974/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings for PySp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2416/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r192245644
  
    --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile ---
    @@ -0,0 +1,33 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +ARG base_img
    +FROM $base_img
    +WORKDIR /
    +COPY python /opt/spark/python
    +RUN apk add --no-cache python && \
    +    python -m ensurepip && \
    +    rm -r /usr/lib/python*/ensurepip && \
    +    pip install --upgrade pip setuptools && \
    +    rm -r /root/.cache
    --- End diff --
    
    Yes


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3748/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186244580
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/bindings/PythonDriverFeatureStep.scala ---
    @@ -0,0 +1,59 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.deploy.k8s.features.bindings
    +
    +import io.fabric8.kubernetes.api.model.ContainerBuilder
    +import io.fabric8.kubernetes.api.model.HasMetadata
    +
    +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesRoleSpecificConf, SparkPod}
    +import org.apache.spark.deploy.k8s.Constants._
    +import org.apache.spark.deploy.k8s.KubernetesUtils
    +import org.apache.spark.deploy.k8s.features.KubernetesFeatureConfigStep
    +
    +private[spark] class PythonDriverFeatureStep(
    +  kubernetesConf: KubernetesConf[_ <: KubernetesRoleSpecificConf])
    +  extends KubernetesFeatureConfigStep {
    +  override def configurePod(pod: SparkPod): SparkPod = {
    +    val mainResource = kubernetesConf.pySparkMainResource()
    +    require(mainResource.isDefined, "PySpark Main Resource must be defined")
    +    val otherPyFiles = kubernetesConf.pyFiles().map(pyFile =>
    +      KubernetesUtils.resolveFileUrisAndPath(pyFile.split(","))
    +        .mkString(":")).getOrElse("")
    +    val withPythonPrimaryFileContainer = new ContainerBuilder(pod.container)
    +      .addNewEnv()
    +        .withName(ENV_PYSPARK_ARGS)
    +        .withValue(kubernetesConf.pySparkAppArgs().getOrElse(""))
    --- End diff --
    
    Avoid adding empty env vars. The entrypoint should be able to detect the presence of the environment variable being set or not entirely. Then you should only attach these environment variables if they're present.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91530/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test status failure
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3692/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r189986135
  
    --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile ---
    @@ -0,0 +1,34 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +ARG base_img
    +FROM $base_img
    +WORKDIR /
    +RUN mkdir ${SPARK_HOME}/python
    +COPY python/lib ${SPARK_HOME}/python/lib
    +RUN apk add --no-cache python && \
    +    apk add --no-cache python3 && \
    +    python -m ensurepip && \
    +    python3 -m ensurepip && \
    +    rm -r /usr/lib/python*/ensurepip && \
    +    pip install --upgrade pip setuptools && \
    --- End diff --
    
    ping


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    **[Test build #90660 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90660/testReport)** for PR 21092 at commit [`72953a3`](https://github.com/apache/spark/commit/72953a3ef42ce0aa0d4b55c0f213198b4b468907).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings for PySp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test starting
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2365/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    **[Test build #91394 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91394/testReport)** for PR 21092 at commit [`24a704e`](https://github.com/apache/spark/commit/24a704e74f2c5816e5ea60dbf607be00c090ae8b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by erikerlandson <gi...@git.apache.org>.
Github user erikerlandson commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r183161979
  
    --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile ---
    @@ -0,0 +1,33 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +ARG base_img
    +FROM $base_img
    +WORKDIR /
    +COPY python /opt/spark/python
    +RUN apk add --no-cache python && \
    +    python -m ensurepip && \
    +    rm -r /usr/lib/python*/ensurepip && \
    +    pip install --upgrade pip setuptools && \
    +    rm -r /root/.cache
    +ENV PYTHON_VERSION 2.7.13
    --- End diff --
    
    I think a canonical container should include both. My instinct is that a user should be able to "force" the use of one or the other. If someone is invoking spark-submit in cluster-mode, with a supplied python file, some kind of CLI argument (--conf or otherwise) seems like the only totally foolproof way to identify that for the eventual pod construction, but maybe there is a better way?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings for PySp...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Integration tests are meant to be in this repository but we haven't gotten there yet. See https://github.com/apache/spark/pull/20697


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings for PySp...

Posted by foxish <gi...@git.apache.org>.
Github user foxish commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    cc @holdenk 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test starting
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3091/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r190980689
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala ---
    @@ -48,7 +48,8 @@ private[spark] class BasicDriverFeatureStep(
       private val driverMemoryMiB = conf.get(DRIVER_MEMORY)
       private val memoryOverheadMiB = conf
         .get(DRIVER_MEMORY_OVERHEAD)
    -    .getOrElse(math.max((MEMORY_OVERHEAD_FACTOR * driverMemoryMiB).toInt, MEMORY_OVERHEAD_MIN_MIB))
    +    .getOrElse(math.max((conf.get(MEMORY_OVERHEAD_FACTOR).getOrElse(0.1) * driverMemoryMiB).toInt,
    --- End diff --
    
    Seems a bit strange we set the default to 0.1 here while in `KubernetesConf#createDriverConf` we're setting it to 0.4 if it's missing.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    **[Test build #91537 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91537/testReport)** for PR 21092 at commit [`ab92913`](https://github.com/apache/spark/commit/ab92913f1d0c303a5ff6b2937019d5e47c25611d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings for PySp...

Posted by erikerlandson <gi...@git.apache.org>.
Github user erikerlandson commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Thanks @ifilonenko !
    I'm interested in figuring out what it means for the container images to be "python 2/3 generic" - does that imply being able to run either, based on submit parameters?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186591608
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesDriverBuilder.scala ---
    @@ -29,18 +31,36 @@ private[spark] class KubernetesDriverBuilder(
           new DriverServiceFeatureStep(_),
         provideSecretsStep: (KubernetesConf[_ <: KubernetesRoleSpecificConf]
           => MountSecretsFeatureStep) =
    -      new MountSecretsFeatureStep(_)) {
    +      new MountSecretsFeatureStep(_),
    +    provideJavaStep: (
    +      KubernetesConf[KubernetesDriverSpecificConf]
    +        => JavaDriverFeatureStep) =
    +    new JavaDriverFeatureStep(_),
    --- End diff --
    
    Indentation is off.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test status success
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3613/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test status success
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3088/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21092


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186591946
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesDriverBuilder.scala ---
    @@ -29,18 +31,36 @@ private[spark] class KubernetesDriverBuilder(
           new DriverServiceFeatureStep(_),
         provideSecretsStep: (KubernetesConf[_ <: KubernetesRoleSpecificConf]
           => MountSecretsFeatureStep) =
    -      new MountSecretsFeatureStep(_)) {
    +      new MountSecretsFeatureStep(_),
    +    provideJavaStep: (
    +      KubernetesConf[KubernetesDriverSpecificConf]
    +        => JavaDriverFeatureStep) =
    +    new JavaDriverFeatureStep(_),
    +    providePythonStep: (
    +      KubernetesConf[KubernetesDriverSpecificConf]
    +      => PythonDriverFeatureStep) =
    +      new PythonDriverFeatureStep(_)) {
     
       def buildFromFeatures(
         kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]): KubernetesDriverSpec = {
         val baseFeatures = Seq(
           provideBasicStep(kubernetesConf),
           provideCredentialsStep(kubernetesConf),
           provideServiceStep(kubernetesConf))
    -    val allFeatures = if (kubernetesConf.roleSecretNamesToMountPaths.nonEmpty) {
    -      baseFeatures ++ Seq(provideSecretsStep(kubernetesConf))
    -    } else baseFeatures
    -
    +    val maybeRoleSecretNamesStep = if (kubernetesConf.roleSecretNamesToMountPaths.nonEmpty) {
    +      Some(provideSecretsStep(kubernetesConf)) } else None
    +    val bindingsStep = kubernetesConf.roleSpecificConf.mainAppResource.getOrElse(None)
    --- End diff --
    
    In particular if you don't apply any binding step then I think you'll lose the user's application arguments for Java applications.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test status success
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2995/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r183156236
  
    --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala ---
    @@ -55,7 +55,8 @@ class KubernetesConfSuite extends SparkFunSuite {
           APP_ID,
           None,
           MAIN_CLASS,
    -      APP_ARGS)
    +      APP_ARGS,
    +      None)
    --- End diff --
    
    Commonly, to improve readability, we assign names to the the None params (e.g. foobaz=None) rather than positional arguments.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    **[Test build #91373 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91373/testReport)** for PR 21092 at commit [`7bedeb6`](https://github.com/apache/spark/commit/7bedeb6ee154136b3462c6b8454efb8a1b56dd5c).
     * This patch passes all tests.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r193796798
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala ---
    @@ -154,6 +176,24 @@ private[spark] object Config extends Logging {
           .checkValue(interval => interval > 0, s"Logging interval must be a positive time value.")
           .createWithDefaultString("1s")
     
    +  val MEMORY_OVERHEAD_FACTOR =
    +    ConfigBuilder("spark.kubernetes.memoryOverheadFactor")
    +      .doc("This sets the Memory Overhead Factor that will allocate memory to non-JVM jobs " +
    +        "which in the case of JVM tasks will default to 0.10 and 0.40 for non-JVM jobs")
    +      .doubleConf
    +      .checkValue(mem_overhead => mem_overhead >= 0 && mem_overhead < 1,
    +        "Ensure that memory overhead is a double between 0 --> 1.0")
    +      .createWithDefault(0.1)
    +
    +  val PYSPARK_MAJOR_PYTHON_VERSION =
    +    ConfigBuilder("spark.kubernetes.pyspark.pythonversion")
    +      .doc("This sets the python version. Either 2 or 3. (Python2 or Python3)")
    +      .stringConf
    +      .checkValue(pv => List("2", "3").contains(pv),
    +        "Ensure that Python Version is either Python2 or Python3")
    +      .createWithDefault("2")
    --- End diff --
    
    No particular reason. I just thought that the major version should default to 2. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test status success
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3694/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3837/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r183162555
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala ---
    @@ -54,7 +54,7 @@ private[spark] class BasicExecutorFeatureStep(
     
       private val memoryOverheadMiB = kubernetesConf
         .get(EXECUTOR_MEMORY_OVERHEAD)
    -    .getOrElse(math.max((MEMORY_OVERHEAD_FACTOR * executorMemoryMiB).toInt,
    +    .getOrElse(math.max((kubernetesConf.get(MEMORY_OVERHEAD_FACTOR) * executorMemoryMiB).toInt,
    --- End diff --
    
    Yup!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3835/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r183153556
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -695,9 +693,17 @@ private[spark] class SparkSubmit extends Logging {
         if (isKubernetesCluster) {
           childMainClass = KUBERNETES_CLUSTER_SUBMIT_CLASS
           if (args.primaryResource != SparkLauncher.NO_RESOURCE) {
    -        childArgs ++= Array("--primary-java-resource", args.primaryResource)
    +        if (args.isPython) {
    --- End diff --
    
    This logic appears to duplicated from YARN, would it make sense to factor this out into a common function?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r183156964
  
    --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile ---
    @@ -0,0 +1,33 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +ARG base_img
    +FROM $base_img
    +WORKDIR /
    +COPY python /opt/spark/python
    +RUN apk add --no-cache python && \
    +    python -m ensurepip && \
    +    rm -r /usr/lib/python*/ensurepip && \
    +    pip install --upgrade pip setuptools && \
    +    rm -r /root/.cache
    +ENV PYTHON_VERSION 2.7.13
    --- End diff --
    
    So I think it might make sense to build the container with both 2 & 3 since the container might be built by a vendor or cluster administrator and then used by a variety of people. What do folks think?
    
    As for figuring out the env, if we wanted to do it that way we can call the current users python and ask it for its version version information (based on the Spark Python enviroment variables).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings for PySp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91373/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    jenkins, ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2798/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test status success
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3612/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186240449
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/bindings/PythonDriverFeatureStep.scala ---
    @@ -0,0 +1,59 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.deploy.k8s.features.bindings
    +
    +import io.fabric8.kubernetes.api.model.ContainerBuilder
    +import io.fabric8.kubernetes.api.model.HasMetadata
    +
    +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesRoleSpecificConf, SparkPod}
    +import org.apache.spark.deploy.k8s.Constants._
    +import org.apache.spark.deploy.k8s.KubernetesUtils
    +import org.apache.spark.deploy.k8s.features.KubernetesFeatureConfigStep
    +
    +private[spark] class PythonDriverFeatureStep(
    +  kubernetesConf: KubernetesConf[_ <: KubernetesRoleSpecificConf])
    +  extends KubernetesFeatureConfigStep {
    +  override def configurePod(pod: SparkPod): SparkPod = {
    +    val mainResource = kubernetesConf.pySparkMainResource()
    +    require(mainResource.isDefined, "PySpark Main Resource must be defined")
    +    val otherPyFiles = kubernetesConf.pyFiles().map(pyFile =>
    +      KubernetesUtils.resolveFileUrisAndPath(pyFile.split(","))
    +        .mkString(":")).getOrElse("")
    --- End diff --
    
    Leave a comment that we are switching from "," to ":" to match the format expected by the PYTHONPATH environment variable. ( http://xkcd.com/1987 )


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    **[Test build #91394 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91394/testReport)** for PR 21092 at commit [`24a704e`](https://github.com/apache/spark/commit/24a704e74f2c5816e5ea60dbf607be00c090ae8b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    **[Test build #90424 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90424/testReport)** for PR 21092 at commit [`306f3ed`](https://github.com/apache/spark/commit/306f3edcd6acba89052e92820d433ae9405e8a28).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186298350
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala ---
    @@ -88,15 +94,22 @@ private[spark] class BasicDriverFeatureStep(
             .addToRequests("memory", driverMemoryQuantity)
             .addToLimits("memory", driverMemoryQuantity)
             .endResources()
    -      .addToArgs("driver")
    +      .addToArgs(driverDockerContainer)
           .addToArgs("--properties-file", SPARK_CONF_PATH)
           .addToArgs("--class", conf.roleSpecificConf.mainClass)
    -      // The user application jar is merged into the spark.jars list and managed through that
    -      // property, so there is no need to reference it explicitly here.
    -      .addToArgs(SparkLauncher.NO_RESOURCE)
    -      .addToArgs(conf.roleSpecificConf.appArgs: _*)
    -      .build()
     
    +    val driverContainer =
    +      if (driverDockerContainer == "driver-py") {
    --- End diff --
    
    Agreed. I didn't know if we wanted to include a JavaDriverFeatureStep. I will do so then. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r189981496
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala ---
    @@ -154,6 +176,24 @@ private[spark] object Config extends Logging {
           .checkValue(interval => interval > 0, s"Logging interval must be a positive time value.")
           .createWithDefaultString("1s")
     
    +  val MEMORY_OVERHEAD_FACTOR =
    +    ConfigBuilder("spark.kubernetes.memoryOverheadFactor")
    +      .doc("This sets the Memory Overhead Factor that will allocate memory to non-JVM jobs " +
    +        "which in the case of JVM tasks will default to 0.10 and 0.40 for non-JVM jobs")
    +      .doubleConf
    +      .checkValue(mem_overhead => mem_overhead >= 0 && mem_overhead < 1,
    +        "Ensure that memory overhead is a double between 0 --> 1.0")
    +      .createOptional
    +
    +  val PYSPARK_PYTHON_VERSION =
    --- End diff --
    
    This is minor, but I have a few questions about this element of the config.
    First of if this is going to be majour version only lets call it something like majourPythonVersion (e.g. many python2 and python3s exist).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r194109752
  
    --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala ---
    @@ -87,11 +89,37 @@ class KubernetesConfSuite extends SparkFunSuite {
           APP_ID,
           None,
           MAIN_CLASS,
    -      APP_ARGS)
    +      APP_ARGS,
    +      None)
         assert(kubernetesConfWithoutMainJar.sparkConf.get("spark.jars").split(",")
           === Array("local:///opt/spark/jar1.jar"))
    +    assert(kubernetesConfWithoutMainJar.sparkConf.get(MEMORY_OVERHEAD_FACTOR) === 0.1)
       }
     
    +  test("Creating driver conf with a python primary file") {
    --- End diff --
    
    Just a follow up we should have a test for with Python and overriding MEMORY_OVERHEAD_FACTOR (e.g. test to make sure that setIfMissing since we had it the other way earlier in the PR).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by foxish <gi...@git.apache.org>.
Github user foxish commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r182522479
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala ---
    @@ -71,7 +77,7 @@ private[spark] class BasicDriverFeatureStep(
           ("cpu", new QuantityBuilder(false).withAmount(limitCores).build())
         }
     
    -    val driverContainer = new ContainerBuilder(pod.container)
    +    val withoutArgsDriverContainer: ContainerBuilder = new ContainerBuilder(pod.container)
    --- End diff --
    
    The previous name seemed clearer to me. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r192245626
  
    --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile ---
    @@ -0,0 +1,34 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +ARG base_img
    +FROM $base_img
    +WORKDIR /
    +RUN mkdir ${SPARK_HOME}/python
    +COPY python/lib ${SPARK_HOME}/python/lib
    +RUN apk add --no-cache python && \
    +    apk add --no-cache python3 && \
    +    python -m ensurepip && \
    +    python3 -m ensurepip && \
    +    rm -r /usr/lib/python*/ensurepip && \
    +    pip install --upgrade pip setuptools && \
    --- End diff --
    
    I will include pip3.6 installation for now until we figure out a long-term venv solution in the next PR


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    **[Test build #91373 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91373/testReport)** for PR 21092 at commit [`7bedeb6`](https://github.com/apache/spark/commit/7bedeb6ee154136b3462c6b8454efb8a1b56dd5c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3182/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r187811483
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala ---
    @@ -54,7 +54,8 @@ private[spark] class BasicExecutorFeatureStep(
     
       private val memoryOverheadMiB = kubernetesConf
         .get(EXECUTOR_MEMORY_OVERHEAD)
    -    .getOrElse(math.max((MEMORY_OVERHEAD_FACTOR * executorMemoryMiB).toInt,
    +    .getOrElse(math.max(
    +      (kubernetesConf.get(MEMORY_OVERHEAD_FACTOR).getOrElse(0.1) * executorMemoryMiB).toInt,
    --- End diff --
    
    Logic is different since it takes in Driver vs. Executor configs to determine memory size. only similar logic is in `kubernetesConf.get(MEMORY_OVERHEAD_FACTOR).getOrElse(0.1)` which seems unnecessary to put into a Utils. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test starting
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3146/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    **[Test build #90421 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90421/testReport)** for PR 21092 at commit [`c59068d`](https://github.com/apache/spark/commit/c59068db8e579ff8fb646d402c33e17c433f7c8f).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    **[Test build #91530 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91530/testReport)** for PR 21092 at commit [`6a6d69d`](https://github.com/apache/spark/commit/6a6d69d01be0739716947320a4ad4d80e8eaa115).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    **[Test build #91374 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91374/testReport)** for PR 21092 at commit [`1801e96`](https://github.com/apache/spark/commit/1801e9665dedda4ce1fc4286f49cbcf5ef1b1b0c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r192270208
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ---
    @@ -102,17 +110,30 @@ private[spark] object KubernetesConf {
           appId: String,
           mainAppResource: Option[MainAppResource],
           mainClass: String,
    -      appArgs: Array[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
    +      appArgs: Array[String],
    +      maybePyFiles: Option[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
         val sparkConfWithMainAppJar = sparkConf.clone()
    +    val additionalFiles = mutable.ArrayBuffer.empty[String]
         mainAppResource.foreach {
    -      case JavaMainAppResource(res) =>
    -        val previousJars = sparkConf
    -          .getOption("spark.jars")
    -          .map(_.split(","))
    -          .getOrElse(Array.empty)
    -        if (!previousJars.contains(res)) {
    -          sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    -        }
    +        case JavaMainAppResource(res) =>
    +          val previousJars = sparkConf
    +            .getOption("spark.jars")
    +            .map(_.split(","))
    +            .getOrElse(Array.empty)
    +          if (!previousJars.contains(res)) {
    +            sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    +          }
    +        // The function of this outer match is to account for multiple nonJVM
    +        // bindings that will all have increased MEMORY_OVERHEAD_FACTOR to 0.4
    +        case nonJVM: NonJVMResource =>
    +          nonJVM match {
    +            case PythonMainAppResource(res) =>
    +              additionalFiles += res
    +              maybePyFiles.foreach{maybePyFiles =>
    +                additionalFiles.appendAll(maybePyFiles.split(","))}
    +              sparkConfWithMainAppJar.set(KUBERNETES_PYSPARK_MAIN_APP_RESOURCE, res)
    +          }
    +          sparkConfWithMainAppJar.set(MEMORY_OVERHEAD_FACTOR, 0.4)
    --- End diff --
    
    Also you probably only want to `setIfMissing`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2975/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    jenkins, ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91537/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test status success
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3091/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by foxish <gi...@git.apache.org>.
Github user foxish commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r182523365
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBuilder.scala ---
    @@ -29,9 +30,11 @@ private[spark] class KubernetesExecutorBuilder(
       def buildFromFeatures(
         kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]): SparkPod = {
         val baseFeatures = Seq(provideBasicStep(kubernetesConf))
    -    val allFeatures = if (kubernetesConf.roleSecretNamesToMountPaths.nonEmpty) {
    -      baseFeatures ++ Seq(provideSecretsStep(kubernetesConf))
    -    } else baseFeatures
    +    val maybeRoleSecretNamesStep = if (kubernetesConf.roleSecretNamesToMountPaths.nonEmpty) {
    +      Some(provideSecretsStep(kubernetesConf)) } else None
    +    val allFeatures: Seq[KubernetesFeatureConfigStep] =
    --- End diff --
    
    It does not need any changes/arg passing during executor pod construction?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test starting
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2995/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r182567225
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBuilder.scala ---
    @@ -29,9 +30,11 @@ private[spark] class KubernetesExecutorBuilder(
       def buildFromFeatures(
         kubernetesConf: KubernetesConf[KubernetesExecutorSpecificConf]): SparkPod = {
         val baseFeatures = Seq(provideBasicStep(kubernetesConf))
    -    val allFeatures = if (kubernetesConf.roleSecretNamesToMountPaths.nonEmpty) {
    -      baseFeatures ++ Seq(provideSecretsStep(kubernetesConf))
    -    } else baseFeatures
    +    val maybeRoleSecretNamesStep = if (kubernetesConf.roleSecretNamesToMountPaths.nonEmpty) {
    +      Some(provideSecretsStep(kubernetesConf)) } else None
    +    val allFeatures: Seq[KubernetesFeatureConfigStep] =
    --- End diff --
    
    No, but there will be more features and I thought that doing options in the description of `allFeatures` was cleaner


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test starting
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3612/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    **[Test build #90424 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90424/testReport)** for PR 21092 at commit [`306f3ed`](https://github.com/apache/spark/commit/306f3edcd6acba89052e92820d433ae9405e8a28).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r182567081
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala ---
    @@ -71,7 +77,7 @@ private[spark] class BasicDriverFeatureStep(
           ("cpu", new QuantityBuilder(false).withAmount(limitCores).build())
         }
     
    -    val driverContainer = new ContainerBuilder(pod.container)
    +    val withoutArgsDriverContainer: ContainerBuilder = new ContainerBuilder(pod.container)
    --- End diff --
    
    Yes. look below


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r187647601
  
    --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile ---
    @@ -0,0 +1,34 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +ARG base_img
    +FROM $base_img
    +WORKDIR /
    +RUN mkdir ${SPARK_HOME}/python
    +COPY python/lib ${SPARK_HOME}/python/lib
    +RUN apk add --no-cache python && \
    +    apk add --no-cache python3 && \
    +    python -m ensurepip && \
    +    python3 -m ensurepip && \
    +    rm -r /usr/lib/python*/ensurepip && \
    +    pip install --upgrade pip setuptools && \
    --- End diff --
    
    So you can run both pip and pip3 with the same packages and they'll get installed in different directories and shouldn't stomp on top of eachother. That being said long term venvs are probably the way we want to go, but as we've discussed those are probably non-trivial and should go in a second PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3242/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186793604
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala ---
    @@ -44,11 +44,16 @@ private[spark] class BasicDriverFeatureStep(
       private val driverCpuCores = conf.get("spark.driver.cores", "1")
       private val driverLimitCores = conf.get(KUBERNETES_DRIVER_LIMIT_CORES)
     
    +  private val driverDockerContainer = conf.roleSpecificConf.mainAppResource.map {
    +    case JavaMainAppResource(_) => "driver"
    +    case PythonMainAppResource(_) => "driver-py"
    +  }.getOrElse(throw new SparkException("Must specify a JVM or Python Resource"))
    --- End diff --
    
    Should I therefore not throw an error here @mccheah and move this logic into the steps? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186239895
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala ---
    @@ -71,7 +77,7 @@ private[spark] class BasicDriverFeatureStep(
           ("cpu", new QuantityBuilder(false).withAmount(limitCores).build())
         }
     
    -    val driverContainer = new ContainerBuilder(pod.container)
    +    val withoutArgsDriverContainer: ContainerBuilder = new ContainerBuilder(pod.container)
    --- End diff --
    
    But we _do_ set arguments on this one right? If not please insert a white space so I can see the different visually.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test starting
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2997/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    **[Test build #91573 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91573/testReport)** for PR 21092 at commit [`a61d897`](https://github.com/apache/spark/commit/a61d8973a8961fa69e50757f049b04bda292a088).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test status success
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2997/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r183154459
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ---
    @@ -101,17 +109,29 @@ private[spark] object KubernetesConf {
           appId: String,
           mainAppResource: Option[MainAppResource],
           mainClass: String,
    -      appArgs: Array[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
    +      appArgs: Array[String],
    +      maybePyFiles: Option[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
         val sparkConfWithMainAppJar = sparkConf.clone()
    +    val additionalFiles = mutable.ArrayBuffer.empty[String]
         mainAppResource.foreach {
    -      case JavaMainAppResource(res) =>
    -        val previousJars = sparkConf
    -          .getOption("spark.jars")
    -          .map(_.split(","))
    -          .getOrElse(Array.empty)
    -        if (!previousJars.contains(res)) {
    -          sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    -        }
    +        case JavaMainAppResource(res) =>
    +          val previousJars = sparkConf
    +            .getOption("spark.jars")
    +            .map(_.split(","))
    +            .getOrElse(Array.empty)
    +          if (!previousJars.contains(res)) {
    +            sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    +          }
    +        case nonJVM: NonJVMResource =>
    +          nonJVM match {
    +            case PythonMainAppResource(res) =>
    +              additionalFiles += res
    +              maybePyFiles.foreach{maybePyFiles =>
    +                additionalFiles.appendAll(maybePyFiles.split(","))}
    +              sparkConfWithMainAppJar.set(KUBERNETES_PYSPARK_MAIN_APP_RESOURCE, res)
    +              sparkConfWithMainAppJar.set(KUBERNETES_PYSPARK_APP_ARGS, appArgs.mkString(" "))
    +          }
    +          sparkConfWithMainAppJar.set(MEMORY_OVERHEAD_FACTOR, 0.4)
    --- End diff --
    
    So wait, if the user has specified a different value I don't think we should override it and its not clear to me that this code will not override a user specified value.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r192457908
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ---
    @@ -102,17 +110,30 @@ private[spark] object KubernetesConf {
           appId: String,
           mainAppResource: Option[MainAppResource],
           mainClass: String,
    -      appArgs: Array[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
    +      appArgs: Array[String],
    +      maybePyFiles: Option[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
         val sparkConfWithMainAppJar = sparkConf.clone()
    +    val additionalFiles = mutable.ArrayBuffer.empty[String]
         mainAppResource.foreach {
    -      case JavaMainAppResource(res) =>
    -        val previousJars = sparkConf
    -          .getOption("spark.jars")
    -          .map(_.split(","))
    -          .getOrElse(Array.empty)
    -        if (!previousJars.contains(res)) {
    -          sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    -        }
    +        case JavaMainAppResource(res) =>
    +          val previousJars = sparkConf
    +            .getOption("spark.jars")
    +            .map(_.split(","))
    +            .getOrElse(Array.empty)
    +          if (!previousJars.contains(res)) {
    +            sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    +          }
    +        // The function of this outer match is to account for multiple nonJVM
    +        // bindings that will all have increased MEMORY_OVERHEAD_FACTOR to 0.4
    +        case nonJVM: NonJVMResource =>
    +          nonJVM match {
    +            case PythonMainAppResource(res) =>
    +              additionalFiles += res
    +              maybePyFiles.foreach{maybePyFiles =>
    +                additionalFiles.appendAll(maybePyFiles.split(","))}
    +              sparkConfWithMainAppJar.set(KUBERNETES_PYSPARK_MAIN_APP_RESOURCE, res)
    +          }
    +          sparkConfWithMainAppJar.set(MEMORY_OVERHEAD_FACTOR, 0.4)
    --- End diff --
    
    I think that power users would want the ability to try to overwrite this if they have a specific amount of memory overhead that they want and know that they need. Configurations should always be configurable, with defaults that are sane. I agree that we can afford to set a better default for Kubernetes, but there should always be a way to override default settings if the user knows the characteristics of their job. For example if the user does memory profiling of their container and sees that it's not using the full amount of memory, they can afford to drop this value and leave more resources for other applications.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r183157702
  
    --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile ---
    @@ -0,0 +1,33 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +ARG base_img
    +FROM $base_img
    +WORKDIR /
    +COPY python /opt/spark/python
    +RUN apk add --no-cache python && \
    +    python -m ensurepip && \
    +    rm -r /usr/lib/python*/ensurepip && \
    +    pip install --upgrade pip setuptools && \
    +    rm -r /root/.cache
    +ENV PYTHON_VERSION 2.7.13
    +ENV PYSPARK_PYTHON python
    +ENV PYSPARK_DRIVER_PYTHON python
    +ENV PYTHONPATH ${SPARK_HOME}/python/:${SPARK_HOME}/python/lib/py4j-0.10.6-src.zip:${PYTHONPATH}
    --- End diff --
    
    We're going to need to mention the Py4J zip file needs to be updated here as well :(
    Also open question if we want the PySpark.zip file in here instead of the python/, and or if we're trying to make "slim" images if we want to delete that zip file.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3857/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    **[Test build #90561 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90561/testReport)** for PR 21092 at commit [`72953a3`](https://github.com/apache/spark/commit/72953a3ef42ce0aa0d4b55c0f213198b4b468907).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r183163821
  
    --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh ---
    @@ -62,6 +69,14 @@ case "$SPARK_K8S_CMD" in
           "$@"
         )
         ;;
    +  driver-py)
    +    CMD=(
    +      "$SPARK_HOME/bin/spark-submit"
    +      --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS"
    +      --deploy-mode client
    +      "$@" $PYSPARK_PRIMARY $PYSPARK_SECONDARY
    --- End diff --
    
    @holdenk I thought the PythonRunner takes in a comma delineated string of PyFiles. as an argument which is why I set it to be `--class PythonRunner $PYSPARK_PRIMARY $PYSPARK_FILES $PYSPARK_DRIVER_ARGS`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90561/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by foxish <gi...@git.apache.org>.
Github user foxish commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r182517091
  
    --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh ---
    @@ -62,6 +69,14 @@ case "$SPARK_K8S_CMD" in
           "$@"
         )
         ;;
    +  driver-py)
    +    CMD=(
    +      "$SPARK_HOME/bin/spark-submit"
    +      --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS"
    +      --deploy-mode client
    +      "$@" $PYSPARK_PRIMARY $PYSPARK_SECONDARY
    --- End diff --
    
    Can we have more descriptive names for `PYSPARK_PRIMARY` and `PYSPARK_SECONDARY`? Maybe `PYSPARK_MAINAPP` and `PYSPARK_ARGS`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test starting
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3613/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r183157311
  
    --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile ---
    @@ -0,0 +1,33 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +ARG base_img
    +FROM $base_img
    +WORKDIR /
    +COPY python /opt/spark/python
    +RUN apk add --no-cache python && \
    +    python -m ensurepip && \
    +    rm -r /usr/lib/python*/ensurepip && \
    --- End diff --
    
    Can we add a comment about why this part?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r182567449
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala ---
    @@ -88,15 +94,22 @@ private[spark] class BasicDriverFeatureStep(
             .addToRequests("memory", driverMemoryQuantity)
             .addToLimits("memory", driverMemoryQuantity)
             .endResources()
    -      .addToArgs("driver")
    +      .addToArgs(driverDockerContainer)
           .addToArgs("--properties-file", SPARK_CONF_PATH)
           .addToArgs("--class", conf.roleSpecificConf.mainClass)
    -      // The user application jar is merged into the spark.jars list and managed through that
    -      // property, so there is no need to reference it explicitly here.
    -      .addToArgs(SparkLauncher.NO_RESOURCE)
    -      .addToArgs(conf.roleSpecificConf.appArgs: _*)
    -      .build()
     
    +    val driverContainer =
    +      if (driverDockerContainer == "driver-py") {
    --- End diff --
    
    We can check the appResource but that was already done. I thought it would be overkill to check twice since it was already handled in setting `driverDockerContainer`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test starting
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2993/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by kokes <gi...@git.apache.org>.
Github user kokes commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r193639554
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala ---
    @@ -154,6 +176,24 @@ private[spark] object Config extends Logging {
           .checkValue(interval => interval > 0, s"Logging interval must be a positive time value.")
           .createWithDefaultString("1s")
     
    +  val MEMORY_OVERHEAD_FACTOR =
    +    ConfigBuilder("spark.kubernetes.memoryOverheadFactor")
    +      .doc("This sets the Memory Overhead Factor that will allocate memory to non-JVM jobs " +
    +        "which in the case of JVM tasks will default to 0.10 and 0.40 for non-JVM jobs")
    +      .doubleConf
    +      .checkValue(mem_overhead => mem_overhead >= 0 && mem_overhead < 1,
    +        "Ensure that memory overhead is a double between 0 --> 1.0")
    +      .createWithDefault(0.1)
    +
    +  val PYSPARK_MAJOR_PYTHON_VERSION =
    +    ConfigBuilder("spark.kubernetes.pyspark.pythonversion")
    +      .doc("This sets the python version. Either 2 or 3. (Python2 or Python3)")
    +      .stringConf
    +      .checkValue(pv => List("2", "3").contains(pv),
    +        "Ensure that Python Version is either Python2 or Python3")
    +      .createWithDefault("2")
    --- End diff --
    
    Am I reading this right that the default is Python 2? Is there a reason for that? Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    **[Test build #90660 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90660/testReport)** for PR 21092 at commit [`72953a3`](https://github.com/apache/spark/commit/72953a3ef42ce0aa0d4b55c0f213198b4b468907).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    jenkins, retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91394/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186591836
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesDriverBuilder.scala ---
    @@ -29,18 +31,36 @@ private[spark] class KubernetesDriverBuilder(
           new DriverServiceFeatureStep(_),
         provideSecretsStep: (KubernetesConf[_ <: KubernetesRoleSpecificConf]
           => MountSecretsFeatureStep) =
    -      new MountSecretsFeatureStep(_)) {
    +      new MountSecretsFeatureStep(_),
    +    provideJavaStep: (
    +      KubernetesConf[KubernetesDriverSpecificConf]
    +        => JavaDriverFeatureStep) =
    +    new JavaDriverFeatureStep(_),
    +    providePythonStep: (
    +      KubernetesConf[KubernetesDriverSpecificConf]
    +      => PythonDriverFeatureStep) =
    +      new PythonDriverFeatureStep(_)) {
     
       def buildFromFeatures(
         kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf]): KubernetesDriverSpec = {
         val baseFeatures = Seq(
           provideBasicStep(kubernetesConf),
           provideCredentialsStep(kubernetesConf),
           provideServiceStep(kubernetesConf))
    -    val allFeatures = if (kubernetesConf.roleSecretNamesToMountPaths.nonEmpty) {
    -      baseFeatures ++ Seq(provideSecretsStep(kubernetesConf))
    -    } else baseFeatures
    -
    +    val maybeRoleSecretNamesStep = if (kubernetesConf.roleSecretNamesToMountPaths.nonEmpty) {
    +      Some(provideSecretsStep(kubernetesConf)) } else None
    +    val bindingsStep = kubernetesConf.roleSpecificConf.mainAppResource.getOrElse(None)
    --- End diff --
    
    `getOrElse(None)` is an anti-pattern, I think. We should be able to match directly on the Option itself. Also, if you don't have a main app resource, should the application be treated as a Java application? I would think that's what `SparkLauncher.NO_RESOURCE` would imply.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    @ifilonenko ping on this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186239751
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ---
    @@ -101,17 +112,29 @@ private[spark] object KubernetesConf {
           appId: String,
           mainAppResource: Option[MainAppResource],
           mainClass: String,
    -      appArgs: Array[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
    +      appArgs: Array[String],
    +      maybePyFiles: Option[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
         val sparkConfWithMainAppJar = sparkConf.clone()
    +    val additionalFiles = mutable.ArrayBuffer.empty[String]
         mainAppResource.foreach {
    -      case JavaMainAppResource(res) =>
    -        val previousJars = sparkConf
    -          .getOption("spark.jars")
    -          .map(_.split(","))
    -          .getOrElse(Array.empty)
    -        if (!previousJars.contains(res)) {
    -          sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    -        }
    +        case JavaMainAppResource(res) =>
    +          val previousJars = sparkConf
    +            .getOption("spark.jars")
    +            .map(_.split(","))
    +            .getOrElse(Array.empty)
    +          if (!previousJars.contains(res)) {
    +            sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    +          }
    +        case nonJVM: NonJVMResource =>
    +          nonJVM match {
    +            case PythonMainAppResource(res) =>
    +              additionalFiles += res
    +              maybePyFiles.foreach{maybePyFiles =>
    +                additionalFiles.appendAll(maybePyFiles.split(","))}
    +              sparkConfWithMainAppJar.set(KUBERNETES_PYSPARK_MAIN_APP_RESOURCE, res)
    +              sparkConfWithMainAppJar.set(KUBERNETES_PYSPARK_APP_ARGS, appArgs.mkString(" "))
    +          }
    +          sparkConfWithMainAppJar.setIfMissing(MEMORY_OVERHEAD_FACTOR, 0.4)
    --- End diff --
    
    Do we want to set this in the JVM case?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test status success
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3712/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186240772
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/bindings/PythonDriverFeatureStep.scala ---
    @@ -0,0 +1,59 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.deploy.k8s.features.bindings
    +
    +import io.fabric8.kubernetes.api.model.ContainerBuilder
    +import io.fabric8.kubernetes.api.model.HasMetadata
    +
    +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesRoleSpecificConf, SparkPod}
    +import org.apache.spark.deploy.k8s.Constants._
    +import org.apache.spark.deploy.k8s.KubernetesUtils
    +import org.apache.spark.deploy.k8s.features.KubernetesFeatureConfigStep
    +
    +private[spark] class PythonDriverFeatureStep(
    +  kubernetesConf: KubernetesConf[_ <: KubernetesRoleSpecificConf])
    +  extends KubernetesFeatureConfigStep {
    +  override def configurePod(pod: SparkPod): SparkPod = {
    +    val mainResource = kubernetesConf.pySparkMainResource()
    +    require(mainResource.isDefined, "PySpark Main Resource must be defined")
    +    val otherPyFiles = kubernetesConf.pyFiles().map(pyFile =>
    +      KubernetesUtils.resolveFileUrisAndPath(pyFile.split(","))
    +        .mkString(":")).getOrElse("")
    +    val withPythonPrimaryFileContainer = new ContainerBuilder(pod.container)
    +      .addNewEnv()
    +        .withName(ENV_PYSPARK_ARGS)
    +        .withValue(kubernetesConf.pySparkAppArgs().getOrElse(""))
    +        .endEnv()
    +      .addNewEnv()
    +        .withName(ENV_PYSPARK_PRIMARY)
    +        .withValue(KubernetesUtils.resolveFileUri(mainResource.get))
    +        .endEnv()
    +      .addNewEnv()
    +        .withName(ENV_PYSPARK_FILES)
    +        .withValue(if (otherPyFiles == "") {""} else otherPyFiles)
    --- End diff --
    
    wait, what is this logic?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    **[Test build #91530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91530/testReport)** for PR 21092 at commit [`6a6d69d`](https://github.com/apache/spark/commit/6a6d69d01be0739716947320a4ad4d80e8eaa115).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings for PySp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r183162778
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ---
    @@ -101,17 +109,29 @@ private[spark] object KubernetesConf {
           appId: String,
           mainAppResource: Option[MainAppResource],
           mainClass: String,
    -      appArgs: Array[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
    +      appArgs: Array[String],
    +      maybePyFiles: Option[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
         val sparkConfWithMainAppJar = sparkConf.clone()
    +    val additionalFiles = mutable.ArrayBuffer.empty[String]
         mainAppResource.foreach {
    -      case JavaMainAppResource(res) =>
    -        val previousJars = sparkConf
    -          .getOption("spark.jars")
    -          .map(_.split(","))
    -          .getOrElse(Array.empty)
    -        if (!previousJars.contains(res)) {
    -          sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    -        }
    +        case JavaMainAppResource(res) =>
    +          val previousJars = sparkConf
    +            .getOption("spark.jars")
    +            .map(_.split(","))
    +            .getOrElse(Array.empty)
    +          if (!previousJars.contains(res)) {
    +            sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    +          }
    +        case nonJVM: NonJVMResource =>
    +          nonJVM match {
    +            case PythonMainAppResource(res) =>
    +              additionalFiles += res
    +              maybePyFiles.foreach{maybePyFiles =>
    +                additionalFiles.appendAll(maybePyFiles.split(","))}
    +              sparkConfWithMainAppJar.set(KUBERNETES_PYSPARK_MAIN_APP_RESOURCE, res)
    +              sparkConfWithMainAppJar.set(KUBERNETES_PYSPARK_APP_ARGS, appArgs.mkString(" "))
    +          }
    +          sparkConfWithMainAppJar.set(MEMORY_OVERHEAD_FACTOR, 0.4)
    --- End diff --
    
    Very true, will need to ensure that it does not override the set value


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test starting
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3712/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r187644556
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ---
    @@ -101,17 +112,29 @@ private[spark] object KubernetesConf {
           appId: String,
           mainAppResource: Option[MainAppResource],
           mainClass: String,
    -      appArgs: Array[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
    +      appArgs: Array[String],
    +      maybePyFiles: Option[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
         val sparkConfWithMainAppJar = sparkConf.clone()
    +    val additionalFiles = mutable.ArrayBuffer.empty[String]
         mainAppResource.foreach {
    -      case JavaMainAppResource(res) =>
    -        val previousJars = sparkConf
    -          .getOption("spark.jars")
    -          .map(_.split(","))
    -          .getOrElse(Array.empty)
    -        if (!previousJars.contains(res)) {
    -          sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    -        }
    +        case JavaMainAppResource(res) =>
    +          val previousJars = sparkConf
    +            .getOption("spark.jars")
    +            .map(_.split(","))
    +            .getOrElse(Array.empty)
    +          if (!previousJars.contains(res)) {
    +            sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    +          }
    +        case nonJVM: NonJVMResource =>
    --- End diff --
    
    Maybe worth a comment then? Especially since R support isn't integrated right now it's perhaps not super clear to folks why this is being done.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3749/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    **[Test build #90421 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90421/testReport)** for PR 21092 at commit [`c59068d`](https://github.com/apache/spark/commit/c59068db8e579ff8fb646d402c33e17c433f7c8f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r183157419
  
    --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile ---
    @@ -0,0 +1,33 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +ARG base_img
    +FROM $base_img
    +WORKDIR /
    +COPY python /opt/spark/python
    +RUN apk add --no-cache python && \
    +    python -m ensurepip && \
    +    rm -r /usr/lib/python*/ensurepip && \
    +    pip install --upgrade pip setuptools && \
    +    rm -r /root/.cache
    --- End diff --
    
    Is this just being done for space reasons?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90660/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r183153738
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala ---
    @@ -154,6 +176,13 @@ private[spark] object Config extends Logging {
           .checkValue(interval => interval > 0, s"Logging interval must be a positive time value.")
           .createWithDefaultString("1s")
     
    +  val MEMORY_OVERHEAD_FACTOR =
    +    ConfigBuilder("spark.kubernetes.memoryOverheadFactor")
    +      .doc("This sets the Memory Overhead Factor that will allocate memory to non-JVM jobs " +
    +        "which in the case of JVM tasks will default to 0.10 and 0.40 for non-JVM jobs")
    --- End diff --
    
    +1 to this thanks for adding this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186237367
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -695,9 +693,17 @@ private[spark] class SparkSubmit extends Logging {
         if (isKubernetesCluster) {
           childMainClass = KUBERNETES_CLUSTER_SUBMIT_CLASS
           if (args.primaryResource != SparkLauncher.NO_RESOURCE) {
    -        childArgs ++= Array("--primary-java-resource", args.primaryResource)
    +        if (args.isPython) {
    --- End diff --
    
    We chatted about this off-line and while its close its not exactly the same so we can deal with minor parts of duplication for now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test starting
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2729/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90421/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r193913624
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala ---
    @@ -154,6 +176,24 @@ private[spark] object Config extends Logging {
           .checkValue(interval => interval > 0, s"Logging interval must be a positive time value.")
           .createWithDefaultString("1s")
     
    +  val MEMORY_OVERHEAD_FACTOR =
    +    ConfigBuilder("spark.kubernetes.memoryOverheadFactor")
    +      .doc("This sets the Memory Overhead Factor that will allocate memory to non-JVM jobs " +
    +        "which in the case of JVM tasks will default to 0.10 and 0.40 for non-JVM jobs")
    +      .doubleConf
    +      .checkValue(mem_overhead => mem_overhead >= 0 && mem_overhead < 1,
    +        "Ensure that memory overhead is a double between 0 --> 1.0")
    +      .createWithDefault(0.1)
    +
    +  val PYSPARK_MAJOR_PYTHON_VERSION =
    +    ConfigBuilder("spark.kubernetes.pyspark.pythonversion")
    +      .doc("This sets the python version. Either 2 or 3. (Python2 or Python3)")
    +      .stringConf
    +      .checkValue(pv => List("2", "3").contains(pv),
    +        "Ensure that Python Version is either Python2 or Python3")
    +      .createWithDefault("2")
    --- End diff --
    
    I am willing to do that: thoughts @holdenk ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    @erikerlandson I think pulling from URI is fine for now. The actual comment was just focused on the usage of spark-submit in that case, but I agree longer term we should think about dependencies, especially things which can't just be shipped as zip or pyfiles (but I think that is vNext).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test status success
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2894/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    **[Test build #91374 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91374/testReport)** for PR 21092 at commit [`1801e96`](https://github.com/apache/spark/commit/1801e9665dedda4ce1fc4286f49cbcf5ef1b1b0c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186329528
  
    --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile ---
    @@ -0,0 +1,34 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +ARG base_img
    +FROM $base_img
    +WORKDIR /
    +RUN mkdir ${SPARK_HOME}/python
    +COPY python/lib ${SPARK_HOME}/python/lib
    +RUN apk add --no-cache python && \
    +    apk add --no-cache python3 && \
    +    python -m ensurepip && \
    +    python3 -m ensurepip && \
    +    rm -r /usr/lib/python*/ensurepip && \
    +    pip install --upgrade pip setuptools && \
    --- End diff --
    
    Correct. Would love recommendations on dependency management in regards to ‘pip’ as it’s tricky to allow for both pip installation and pip3 installation. Unless I use two separate virtual environments for dependency management 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test status success
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2993/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings for PySp...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Other not directly related to the code feedback is in the example I would expect sort to be passed as an argument to pi from the quick reading of it and also from using just regular spark submit in local mode so I wouldn't expect spark-submit to not treat it as an argument. Are you just looking to add sort.py as to the users python path so it's included as a resource? If so I think updating the env variables or using --py-files is the way to go. If I've missunderstood that question/example though no stress :)
    
    And thank you so much for working on this I'm super excited to see the progress. Sorry for only the quick first-pass review but I figured since its a work in progress that is what you are looking for. If you want more detailed feedback please ping me :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test status success
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2973/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91573/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test starting
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3692/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    @lucashu1 please send your question to stackoverflow or user@spark.apache.org!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r190980379
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ---
    @@ -63,10 +67,16 @@ private[spark] case class KubernetesConf[T <: KubernetesRoleSpecificConf](
         .map(str => str.split(",").toSeq)
         .getOrElse(Seq.empty[String])
     
    -  def sparkFiles(): Seq[String] = sparkConf
    -    .getOption("spark.files")
    -    .map(str => str.split(",").toSeq)
    -    .getOrElse(Seq.empty[String])
    +  def getRoleConf: T = roleSpecificConf
    --- End diff --
    
    Don't think we use this anywhere?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r190981207
  
    --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/bindings/PythonDriverFeatureStepSuite.scala ---
    @@ -0,0 +1,107 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.deploy.k8s.features.bindings
    +
    +import scala.collection.JavaConverters._
    +
    +import org.apache.spark.{SparkConf, SparkFunSuite}
    +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesDriverSpecificConf, SparkPod}
    +import org.apache.spark.deploy.k8s.Config._
    +import org.apache.spark.deploy.k8s.Constants._
    +import org.apache.spark.deploy.k8s.submit.PythonMainAppResource
    +
    +class PythonDriverFeatureStepSuite extends SparkFunSuite {
    +
    --- End diff --
    
    Nit: there's an extra newline here


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3082/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test starting
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3088/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r182566963
  
    --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile ---
    @@ -0,0 +1,33 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +ARG base_img
    +FROM $base_img
    +WORKDIR /
    +COPY python /opt/spark/python
    +RUN apk add --no-cache python && \
    +    python -m ensurepip && \
    +    rm -r /usr/lib/python*/ensurepip && \
    +    pip install --upgrade pip setuptools && \
    +    rm -r /root/.cache
    +ENV PYTHON_VERSION 2.7.13
    --- End diff --
    
    That is what I brought up in the PR description. And why this still a WIP. I need to investigate the proper way to determine whether we ship these containers with Python2 or Python3. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by erikerlandson <gi...@git.apache.org>.
Github user erikerlandson commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r182564020
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala ---
    @@ -71,7 +77,7 @@ private[spark] class BasicDriverFeatureStep(
           ("cpu", new QuantityBuilder(false).withAmount(limitCores).build())
         }
     
    -    val driverContainer = new ContainerBuilder(pod.container)
    +    val withoutArgsDriverContainer: ContainerBuilder = new ContainerBuilder(pod.container)
    --- End diff --
    
    is there a corresponding driver container _with_ args?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test status success
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2895/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test starting
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2894/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test starting
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3623/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3186/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r183155443
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala ---
    @@ -54,7 +54,7 @@ private[spark] class BasicExecutorFeatureStep(
     
       private val memoryOverheadMiB = kubernetesConf
         .get(EXECUTOR_MEMORY_OVERHEAD)
    -    .getOrElse(math.max((MEMORY_OVERHEAD_FACTOR * executorMemoryMiB).toInt,
    +    .getOrElse(math.max((kubernetesConf.get(MEMORY_OVERHEAD_FACTOR) * executorMemoryMiB).toInt,
    --- End diff --
    
    So this won't allow people to set a lower overhead than MEMORY_OVERHEAD_MIN_MIB, is that the goal?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings for PySp...

Posted by foxish <gi...@git.apache.org>.
Github user foxish commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Thanks for taking this on @ifilonenko. Left some initial comments on the PR without going too much in depth - since as you noted, it's WIP.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186321782
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ---
    @@ -63,10 +67,17 @@ private[spark] case class KubernetesConf[T <: KubernetesRoleSpecificConf](
         .map(str => str.split(",").toSeq)
         .getOrElse(Seq.empty[String])
     
    -  def sparkFiles(): Seq[String] = sparkConf
    -    .getOption("spark.files")
    -    .map(str => str.split(",").toSeq)
    -    .getOrElse(Seq.empty[String])
    +  def pyFiles(): Option[String] = sparkConf
    +    .get(KUBERNETES_PYSPARK_PY_FILES)
    +
    +  def pySparkMainResource(): Option[String] = sparkConf
    --- End diff --
    
    I need to parse out the MainAppResource (which I thought we should be doing only once... as such, I thought it would be cleaner to do this... 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186326478
  
    --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile ---
    @@ -0,0 +1,34 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +ARG base_img
    +FROM $base_img
    +WORKDIR /
    +RUN mkdir ${SPARK_HOME}/python
    +COPY python/lib ${SPARK_HOME}/python/lib
    +RUN apk add --no-cache python && \
    +    apk add --no-cache python3 && \
    +    python -m ensurepip && \
    +    python3 -m ensurepip && \
    +    rm -r /usr/lib/python*/ensurepip && \
    +    pip install --upgrade pip setuptools && \
    --- End diff --
    
    this goes to python2 only, I think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r187646183
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala ---
    @@ -54,7 +54,8 @@ private[spark] class BasicExecutorFeatureStep(
     
       private val memoryOverheadMiB = kubernetesConf
         .get(EXECUTOR_MEMORY_OVERHEAD)
    -    .getOrElse(math.max((MEMORY_OVERHEAD_FACTOR * executorMemoryMiB).toInt,
    +    .getOrElse(math.max(
    +      (kubernetesConf.get(MEMORY_OVERHEAD_FACTOR).getOrElse(0.1) * executorMemoryMiB).toInt,
    --- End diff --
    
    This logic happens twice, and I'd rather see it in a utils or config class where if we update it in one place it will take effect everywhere.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r183156426
  
    --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala ---
    @@ -87,11 +89,37 @@ class KubernetesConfSuite extends SparkFunSuite {
           APP_ID,
           None,
           MAIN_CLASS,
    -      APP_ARGS)
    +      APP_ARGS,
    +      None)
         assert(kubernetesConfWithoutMainJar.sparkConf.get("spark.jars").split(",")
           === Array("local:///opt/spark/jar1.jar"))
    +    assert(kubernetesConfWithoutMainJar.sparkConf.get(MEMORY_OVERHEAD_FACTOR) === 0.1)
       }
     
    +  test("Creating driver conf with a python primary file") {
    --- End diff --
    
    Would like also see a unit test for with a PyFile and an overriden memory overhead.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by foxish <gi...@git.apache.org>.
Github user foxish commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    jenkins, retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186240207
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala ---
    @@ -88,15 +94,22 @@ private[spark] class BasicDriverFeatureStep(
             .addToRequests("memory", driverMemoryQuantity)
             .addToLimits("memory", driverMemoryQuantity)
             .endResources()
    -      .addToArgs("driver")
    +      .addToArgs(driverDockerContainer)
           .addToArgs("--properties-file", SPARK_CONF_PATH)
           .addToArgs("--class", conf.roleSpecificConf.mainClass)
    -      // The user application jar is merged into the spark.jars list and managed through that
    -      // property, so there is no need to reference it explicitly here.
    -      .addToArgs(SparkLauncher.NO_RESOURCE)
    -      .addToArgs(conf.roleSpecificConf.appArgs: _*)
    -      .build()
     
    +    val driverContainer =
    +      if (driverDockerContainer == "driver-py") {
    --- End diff --
    
    Sorry, I was forgot that folks could specify the driver container separately from the worker container nvm.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186266469
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ---
    @@ -101,17 +112,29 @@ private[spark] object KubernetesConf {
           appId: String,
           mainAppResource: Option[MainAppResource],
           mainClass: String,
    -      appArgs: Array[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
    +      appArgs: Array[String],
    +      maybePyFiles: Option[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
         val sparkConfWithMainAppJar = sparkConf.clone()
    +    val additionalFiles = mutable.ArrayBuffer.empty[String]
         mainAppResource.foreach {
    -      case JavaMainAppResource(res) =>
    -        val previousJars = sparkConf
    -          .getOption("spark.jars")
    -          .map(_.split(","))
    -          .getOrElse(Array.empty)
    -        if (!previousJars.contains(res)) {
    -          sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    -        }
    +        case JavaMainAppResource(res) =>
    +          val previousJars = sparkConf
    +            .getOption("spark.jars")
    +            .map(_.split(","))
    +            .getOrElse(Array.empty)
    +          if (!previousJars.contains(res)) {
    +            sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    +          }
    +        case nonJVM: NonJVMResource =>
    --- End diff --
    
    Because the R step should have the same amount of default MemoryOverhead. As should all NonJVMResources. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by foxish <gi...@git.apache.org>.
Github user foxish commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r182524173
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala ---
    @@ -88,15 +94,22 @@ private[spark] class BasicDriverFeatureStep(
             .addToRequests("memory", driverMemoryQuantity)
             .addToLimits("memory", driverMemoryQuantity)
             .endResources()
    -      .addToArgs("driver")
    +      .addToArgs(driverDockerContainer)
           .addToArgs("--properties-file", SPARK_CONF_PATH)
           .addToArgs("--class", conf.roleSpecificConf.mainClass)
    -      // The user application jar is merged into the spark.jars list and managed through that
    -      // property, so there is no need to reference it explicitly here.
    -      .addToArgs(SparkLauncher.NO_RESOURCE)
    -      .addToArgs(conf.roleSpecificConf.appArgs: _*)
    -      .build()
     
    +    val driverContainer =
    +      if (driverDockerContainer == "driver-py") {
    --- End diff --
    
    Wondering if we can discover if it's a Python application in a better way here. Probably using the built up spark conf?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186254991
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ---
    @@ -101,17 +112,29 @@ private[spark] object KubernetesConf {
           appId: String,
           mainAppResource: Option[MainAppResource],
           mainClass: String,
    -      appArgs: Array[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
    +      appArgs: Array[String],
    +      maybePyFiles: Option[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
         val sparkConfWithMainAppJar = sparkConf.clone()
    +    val additionalFiles = mutable.ArrayBuffer.empty[String]
         mainAppResource.foreach {
    -      case JavaMainAppResource(res) =>
    -        val previousJars = sparkConf
    -          .getOption("spark.jars")
    -          .map(_.split(","))
    -          .getOrElse(Array.empty)
    -        if (!previousJars.contains(res)) {
    -          sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    -        }
    +        case JavaMainAppResource(res) =>
    +          val previousJars = sparkConf
    +            .getOption("spark.jars")
    +            .map(_.split(","))
    +            .getOrElse(Array.empty)
    +          if (!previousJars.contains(res)) {
    +            sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    +          }
    +        case nonJVM: NonJVMResource =>
    +          nonJVM match {
    +            case PythonMainAppResource(res) =>
    +              additionalFiles += res
    +              maybePyFiles.foreach{maybePyFiles =>
    +                additionalFiles.appendAll(maybePyFiles.split(","))}
    +              sparkConfWithMainAppJar.set(KUBERNETES_PYSPARK_MAIN_APP_RESOURCE, res)
    +              sparkConfWithMainAppJar.set(KUBERNETES_PYSPARK_APP_ARGS, appArgs.mkString(" "))
    +          }
    +          sparkConfWithMainAppJar.setIfMissing(MEMORY_OVERHEAD_FACTOR, 0.4)
    --- End diff --
    
    This is set later in BaseDriverStep


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186244105
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ---
    @@ -63,10 +67,17 @@ private[spark] case class KubernetesConf[T <: KubernetesRoleSpecificConf](
         .map(str => str.split(",").toSeq)
         .getOrElse(Seq.empty[String])
     
    -  def sparkFiles(): Seq[String] = sparkConf
    -    .getOption("spark.files")
    -    .map(str => str.split(",").toSeq)
    -    .getOrElse(Seq.empty[String])
    +  def pyFiles(): Option[String] = sparkConf
    +    .get(KUBERNETES_PYSPARK_PY_FILES)
    +
    +  def pySparkMainResource(): Option[String] = sparkConf
    --- End diff --
    
    This seems redundant with the driver specific spark conf's MainAppResource. Perhaps remove the need to specify this thing twice?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test status success
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3623/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r190981394
  
    --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh ---
    @@ -53,6 +53,28 @@ if [ -n "$SPARK_MOUNTED_FILES_DIR" ]; then
       cp -R "$SPARK_MOUNTED_FILES_DIR/." .
     fi
     
    +if [ -n "$PYSPARK_FILES" ]; then
    +    PYTHONPATH="$PYTHONPATH:$PYSPARK_FILES"
    +fi
    +
    +PYSPARK_ARGS=""
    +if [ -n "$PYSPARK_APP_ARGS" ]; then
    +    PYSPARK_ARGS="$PYSPARK_APP_ARGS"
    +fi
    +
    +
    +if [ "$PYSPARK_PYTHON_VERSION" == "2" ]; then
    --- End diff --
    
    Should be fine if we don't set this at all, yeah?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by foxish <gi...@git.apache.org>.
Github user foxish commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r182520090
  
    --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/KubernetesDriverBuilderSuite.scala ---
    @@ -89,6 +97,29 @@ class KubernetesDriverBuilderSuite extends SparkFunSuite {
           SECRETS_STEP_TYPE)
       }
     
    +  test("Apply Python step if main resource is python.") {
    +    val conf = KubernetesConf(
    --- End diff --
    
    Unrelated to this PR, but @mccheah, should we have something like the fluent/builder pattern here for `KubernetesConf` since it's grown to quite a few params. I'm happy to take a stab at it if we agree that's a good direction.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186240961
  
    --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala ---
    @@ -55,7 +55,8 @@ class KubernetesConfSuite extends SparkFunSuite {
           APP_ID,
           None,
           MAIN_CLASS,
    -      APP_ARGS)
    +      APP_ARGS,
    +      None)
    --- End diff --
    
    Still want names.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test status failure
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2729/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186244541
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/bindings/PythonDriverFeatureStep.scala ---
    @@ -0,0 +1,59 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.deploy.k8s.features.bindings
    +
    +import io.fabric8.kubernetes.api.model.ContainerBuilder
    +import io.fabric8.kubernetes.api.model.HasMetadata
    +
    +import org.apache.spark.deploy.k8s.{KubernetesConf, KubernetesRoleSpecificConf, SparkPod}
    +import org.apache.spark.deploy.k8s.Constants._
    +import org.apache.spark.deploy.k8s.KubernetesUtils
    +import org.apache.spark.deploy.k8s.features.KubernetesFeatureConfigStep
    +
    +private[spark] class PythonDriverFeatureStep(
    +  kubernetesConf: KubernetesConf[_ <: KubernetesRoleSpecificConf])
    +  extends KubernetesFeatureConfigStep {
    +  override def configurePod(pod: SparkPod): SparkPod = {
    +    val mainResource = kubernetesConf.pySparkMainResource()
    +    require(mainResource.isDefined, "PySpark Main Resource must be defined")
    +    val otherPyFiles = kubernetesConf.pyFiles().map(pyFile =>
    +      KubernetesUtils.resolveFileUrisAndPath(pyFile.split(","))
    +        .mkString(":")).getOrElse("")
    +    val withPythonPrimaryFileContainer = new ContainerBuilder(pod.container)
    +      .addNewEnv()
    +        .withName(ENV_PYSPARK_ARGS)
    +        .withValue(kubernetesConf.pySparkAppArgs().getOrElse(""))
    +        .endEnv()
    +      .addNewEnv()
    +        .withName(ENV_PYSPARK_PRIMARY)
    +        .withValue(KubernetesUtils.resolveFileUri(mainResource.get))
    +        .endEnv()
    +      .addNewEnv()
    +        .withName(ENV_PYSPARK_FILES)
    +        .withValue(if (otherPyFiles == "") {""} else otherPyFiles)
    --- End diff --
    
    Don't add empty env vars - see above.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by foxish <gi...@git.apache.org>.
Github user foxish commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r182518301
  
    --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile ---
    @@ -0,0 +1,33 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +ARG base_img
    +FROM $base_img
    +WORKDIR /
    +COPY python /opt/spark/python
    +RUN apk add --no-cache python && \
    +    python -m ensurepip && \
    +    rm -r /usr/lib/python*/ensurepip && \
    +    pip install --upgrade pip setuptools && \
    +    rm -r /root/.cache
    +ENV PYTHON_VERSION 2.7.13
    --- End diff --
    
    If we set this, are we implicitly imposing a contract on the base image to have this particular version of python installed?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r183078482
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala ---
    @@ -88,15 +94,22 @@ private[spark] class BasicDriverFeatureStep(
             .addToRequests("memory", driverMemoryQuantity)
             .addToLimits("memory", driverMemoryQuantity)
             .endResources()
    -      .addToArgs("driver")
    +      .addToArgs(driverDockerContainer)
           .addToArgs("--properties-file", SPARK_CONF_PATH)
           .addToArgs("--class", conf.roleSpecificConf.mainClass)
    -      // The user application jar is merged into the spark.jars list and managed through that
    -      // property, so there is no need to reference it explicitly here.
    -      .addToArgs(SparkLauncher.NO_RESOURCE)
    -      .addToArgs(conf.roleSpecificConf.appArgs: _*)
    -      .build()
     
    +    val driverContainer =
    +      if (driverDockerContainer == "driver-py") {
    --- End diff --
    
    The second way is the approach that I envisioned and tried to implement. It seems that the approach (without putting too much work on the KubernetesConf) breaks the contract we defined tho. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r192241796
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala ---
    @@ -48,7 +48,8 @@ private[spark] class BasicDriverFeatureStep(
       private val driverMemoryMiB = conf.get(DRIVER_MEMORY)
       private val memoryOverheadMiB = conf
         .get(DRIVER_MEMORY_OVERHEAD)
    -    .getOrElse(math.max((MEMORY_OVERHEAD_FACTOR * driverMemoryMiB).toInt, MEMORY_OVERHEAD_MIN_MIB))
    +    .getOrElse(math.max((conf.get(MEMORY_OVERHEAD_FACTOR).getOrElse(0.1) * driverMemoryMiB).toInt,
    --- End diff --
    
    Isn't it easier to set it here as such, since it should default to `0.1` unless the non-JVM check modifies it to`0.4`? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Will resolve comments today @mccheah


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3057/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    jenkins, retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r182650258
  
    --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile ---
    @@ -0,0 +1,33 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +ARG base_img
    +FROM $base_img
    +WORKDIR /
    +COPY python /opt/spark/python
    +RUN apk add --no-cache python && \
    +    python -m ensurepip && \
    +    rm -r /usr/lib/python*/ensurepip && \
    +    pip install --upgrade pip setuptools && \
    +    rm -r /root/.cache
    +ENV PYTHON_VERSION 2.7.13
    --- End diff --
    
    in some OSes, python vs python3 symlink to the installed version of python, respectively for the version 2.x and 3.x, is that a better approach then hardcoding the version number?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r194107584
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala ---
    @@ -154,6 +176,24 @@ private[spark] object Config extends Logging {
           .checkValue(interval => interval > 0, s"Logging interval must be a positive time value.")
           .createWithDefaultString("1s")
     
    +  val MEMORY_OVERHEAD_FACTOR =
    +    ConfigBuilder("spark.kubernetes.memoryOverheadFactor")
    +      .doc("This sets the Memory Overhead Factor that will allocate memory to non-JVM jobs " +
    +        "which in the case of JVM tasks will default to 0.10 and 0.40 for non-JVM jobs")
    +      .doubleConf
    +      .checkValue(mem_overhead => mem_overhead >= 0 && mem_overhead < 1,
    +        "Ensure that memory overhead is a double between 0 --> 1.0")
    +      .createWithDefault(0.1)
    +
    +  val PYSPARK_MAJOR_PYTHON_VERSION =
    +    ConfigBuilder("spark.kubernetes.pyspark.pythonversion")
    +      .doc("This sets the python version. Either 2 or 3. (Python2 or Python3)")
    +      .stringConf
    +      .checkValue(pv => List("2", "3").contains(pv),
    +        "Ensure that Python Version is either Python2 or Python3")
    +      .createWithDefault("2")
    --- End diff --
    
    I'm fine with either as the default. While Py2 is officially EOL I think we'll still see PySpark Py2 apps for awhile after.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186238816
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala ---
    @@ -101,17 +112,29 @@ private[spark] object KubernetesConf {
           appId: String,
           mainAppResource: Option[MainAppResource],
           mainClass: String,
    -      appArgs: Array[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
    +      appArgs: Array[String],
    +      maybePyFiles: Option[String]): KubernetesConf[KubernetesDriverSpecificConf] = {
         val sparkConfWithMainAppJar = sparkConf.clone()
    +    val additionalFiles = mutable.ArrayBuffer.empty[String]
         mainAppResource.foreach {
    -      case JavaMainAppResource(res) =>
    -        val previousJars = sparkConf
    -          .getOption("spark.jars")
    -          .map(_.split(","))
    -          .getOrElse(Array.empty)
    -        if (!previousJars.contains(res)) {
    -          sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    -        }
    +        case JavaMainAppResource(res) =>
    +          val previousJars = sparkConf
    +            .getOption("spark.jars")
    +            .map(_.split(","))
    +            .getOrElse(Array.empty)
    +          if (!previousJars.contains(res)) {
    +            sparkConfWithMainAppJar.setJars(previousJars ++ Seq(res))
    +          }
    +        case nonJVM: NonJVMResource =>
    +          nonJVM match {
    +            case PythonMainAppResource(res) =>
    +              additionalFiles += res
    +              maybePyFiles.foreach{maybePyFiles =>
    +                additionalFiles.appendAll(maybePyFiles.split(","))}
    --- End diff --
    
    Not for this PR or JIRA, but for later maybe we should normalize our parsing of input files in a way which allows escape characters and share the logic between Yarn/K8s/Mesos/standalone. What do y'all think? Possible follow up JIRA: https://issues.apache.org/jira/browse/SPARK-24184


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    KubernetesSuite:
    - Run SparkPi with no resources
    - Run SparkPi with a very long application name.
    - Run SparkPi with a master URL without a scheme.
    - Run SparkPi with an argument.
    - Run SparkPi with custom labels, annotations, and environment variables.
    - Run SparkPi with a test secret mounted into the driver and executor pods
    - Run extraJVMOptions check on driver
    - Run SparkRemoteFileTest using a remote data file
    - Run PySpark on simple pi.py example
    - Run PySpark with Python2 to test a pyfiles example
    - Run PySpark with Python3 to test a pyfiles example
    Run completed in 4 minutes, 28 seconds.
    Total number of tests run: 11
    Suites: completed 2, aborted 0
    Tests: succeeded 11, failed 0, canceled 0, ignored 0, pending 0
    All tests passed.
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time: 05:24 min
    [INFO] Finished at: 2018-06-07T18:54:42-04:00
    [INFO] Final Memory: 21M/509M
    [INFO] ------------------------------------------------------------------------
    
    For new addition to: https://github.com/apache-spark-on-k8s/spark-integration/pull/46 



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    **[Test build #91537 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91537/testReport)** for PR 21092 at commit [`ab92913`](https://github.com/apache/spark/commit/ab92913f1d0c303a5ff6b2937019d5e47c25611d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test status success
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3146/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r194117385
  
    --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala ---
    @@ -88,15 +90,42 @@ class KubernetesConfSuite extends SparkFunSuite {
           APP_NAME,
           RESOURCE_NAME_PREFIX,
           APP_ID,
    -      None,
    +      mainAppResource = None,
           MAIN_CLASS,
    -      APP_ARGS)
    +      APP_ARGS,
    +      maybePyFiles = None)
         assert(kubernetesConfWithoutMainJar.sparkConf.get("spark.jars").split(",")
           === Array("local:///opt/spark/jar1.jar"))
    +    assert(kubernetesConfWithoutMainJar.sparkConf.get(MEMORY_OVERHEAD_FACTOR) === 0.1)
       }
     
    -  test("Resolve driver labels, annotations, secret mount paths, and envs.") {
    +  test("Creating driver conf with a python primary file") {
    +    val mainResourceFile = "local:///opt/spark/main.py"
    +    val inputPyFiles = Array("local:///opt/spark/example2.py", "local:///example3.py")
         val sparkConf = new SparkConf(false)
    +      .setJars(Seq("local:///opt/spark/jar1.jar"))
    +      .set("spark.files", "local:///opt/spark/example4.py")
    +    val mainAppResource = Some(PythonMainAppResource(mainResourceFile))
    +    val kubernetesConfWithMainResource = KubernetesConf.createDriverConf(
    +      sparkConf,
    +      APP_NAME,
    +      RESOURCE_NAME_PREFIX,
    +      APP_ID,
    +      mainAppResource,
    +      MAIN_CLASS,
    +      APP_ARGS,
    +      Some(inputPyFiles.mkString(",")))
    +    assert(kubernetesConfWithMainResource.sparkConf.get("spark.jars").split(",")
    +      === Array("local:///opt/spark/jar1.jar"))
    +    assert(kubernetesConfWithMainResource.sparkConf.get(MEMORY_OVERHEAD_FACTOR) === 0.4)
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by ifilonenko <gi...@git.apache.org>.
Github user ifilonenko commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r183163103
  
    --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala ---
    @@ -87,11 +89,37 @@ class KubernetesConfSuite extends SparkFunSuite {
           APP_ID,
           None,
           MAIN_CLASS,
    -      APP_ARGS)
    +      APP_ARGS,
    +      None)
         assert(kubernetesConfWithoutMainJar.sparkConf.get("spark.jars").split(",")
           === Array("local:///opt/spark/jar1.jar"))
    +    assert(kubernetesConfWithoutMainJar.sparkConf.get(MEMORY_OVERHEAD_FACTOR) === 0.1)
       }
     
    +  test("Creating driver conf with a python primary file") {
    --- End diff --
    
    Defaults are checked on 96 and 117. (But I need to ensure that it is possible to override as well. Will add)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r182606366
  
    --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/KubernetesDriverBuilderSuite.scala ---
    @@ -89,6 +97,29 @@ class KubernetesDriverBuilderSuite extends SparkFunSuite {
           SECRETS_STEP_TYPE)
       }
     
    +  test("Apply Python step if main resource is python.") {
    +    val conf = KubernetesConf(
    --- End diff --
    
    Depends on what we want to use the builder for. One advantage of a builder is handled by case classes already: the fact that you don't have to order arguments in a particular way; you can get around this by using named parameters when you construct the object. But, if you want to stage the construction of the object in multiple calls, then a builder will get you that while a case class by itself will not.
    
    I think it would be neater to have a builder. The [SparkSession Builder](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L775) is an example from the project we can follow.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test starting
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2895/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    Kubernetes integration test status success
    URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2753/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186591534
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/bindings/PythonDriverFeatureStep.scala ---
    @@ -0,0 +1,72 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.deploy.k8s.features.bindings
    +
    +import scala.collection.JavaConverters._
    +
    +import io.fabric8.kubernetes.api.model.ContainerBuilder
    +import io.fabric8.kubernetes.api.model.EnvVar
    +import io.fabric8.kubernetes.api.model.EnvVarBuilder
    +import io.fabric8.kubernetes.api.model.HasMetadata
    +
    +import org.apache.spark.deploy.k8s.{KubernetesConf, SparkPod}
    +import org.apache.spark.deploy.k8s.Constants._
    +import org.apache.spark.deploy.k8s.KubernetesDriverSpecificConf
    +import org.apache.spark.deploy.k8s.KubernetesUtils
    +import org.apache.spark.deploy.k8s.features.KubernetesFeatureConfigStep
    +
    +private[spark] class PythonDriverFeatureStep(
    +  kubernetesConf: KubernetesConf[KubernetesDriverSpecificConf])
    +  extends KubernetesFeatureConfigStep {
    +  override def configurePod(pod: SparkPod): SparkPod = {
    +    val roleConf = kubernetesConf.roleSpecificConf
    +    require(roleConf.mainAppResource.isDefined, "PySpark Main Resource must be defined")
    +    val maybePythonArgs: Option[EnvVar] = Option(roleConf.appArgs).filter(_.nonEmpty).map(
    --- End diff --
    
    I don't think you should have to declare these types, both for this line and the few others below.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21092: [SPARK-23984][K8S] Initial Python Bindings for PySpark o...

Posted by erikerlandson <gi...@git.apache.org>.
Github user erikerlandson commented on the issue:

    https://github.com/apache/spark/pull/21092
  
    @holdenk I think your [comment above](https://github.com/apache/spark/pull/21092#issuecomment-383211329) gets at a use-case "ambiguity" that containerization causes. There are now at least two choices of channel for supplying dependencies: from the command line, or by customized container (and here there are at least two sub-cases: manually created customizations, or via source-to-image tooling).
    
    When specifying deps via the command line, particularly in cluster mode, we have backed out of staging local files via init-container; does pulling from URI suffice?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r183201223
  
    --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile ---
    @@ -0,0 +1,33 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +ARG base_img
    +FROM $base_img
    +WORKDIR /
    +COPY python /opt/spark/python
    +RUN apk add --no-cache python && \
    +    python -m ensurepip && \
    +    rm -r /usr/lib/python*/ensurepip && \
    +    pip install --upgrade pip setuptools && \
    +    rm -r /root/.cache
    +ENV PYTHON_VERSION 2.7.13
    --- End diff --
    
    perhaps re-use PYSPARK_PYTHON?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by mccheah <gi...@git.apache.org>.
Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r186244449
  
    --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala ---
    @@ -88,15 +94,22 @@ private[spark] class BasicDriverFeatureStep(
             .addToRequests("memory", driverMemoryQuantity)
             .addToLimits("memory", driverMemoryQuantity)
             .endResources()
    -      .addToArgs("driver")
    +      .addToArgs(driverDockerContainer)
           .addToArgs("--properties-file", SPARK_CONF_PATH)
           .addToArgs("--class", conf.roleSpecificConf.mainClass)
    -      // The user application jar is merged into the spark.jars list and managed through that
    -      // property, so there is no need to reference it explicitly here.
    -      .addToArgs(SparkLauncher.NO_RESOURCE)
    -      .addToArgs(conf.roleSpecificConf.appArgs: _*)
    -      .build()
     
    +    val driverContainer =
    +      if (driverDockerContainer == "driver-py") {
    --- End diff --
    
    @ifilonenko I think this still needs some work to clean up.
    
    What I expect to happen is to have three step types:
    
    1. `BasicDriverFeatureStep`, which is what's here except we don't provide the args to the container in this step anymore.
    2. `PythonDriverFeatureStep` which does both what the `PythonDriverFeatureStep` does currently plus adds the `driver-py` argument
    3. `JavaDriverFeatureStep` which only adds the argument `SparkLauncher.NO_RESOURCE`, `conf.roleSpecificConf.appArgs`, etc.
    
    Then in the `KubernetesDriverBuilder`, always apply the first step, and select which of 2 or 3 to apply based on the app resource type.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21092: [SPARK-23984][K8S] Initial Python Bindings for Py...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21092#discussion_r194113403
  
    --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala ---
    @@ -88,15 +90,42 @@ class KubernetesConfSuite extends SparkFunSuite {
           APP_NAME,
           RESOURCE_NAME_PREFIX,
           APP_ID,
    -      None,
    +      mainAppResource = None,
           MAIN_CLASS,
    -      APP_ARGS)
    +      APP_ARGS,
    +      maybePyFiles = None)
         assert(kubernetesConfWithoutMainJar.sparkConf.get("spark.jars").split(",")
           === Array("local:///opt/spark/jar1.jar"))
    +    assert(kubernetesConfWithoutMainJar.sparkConf.get(MEMORY_OVERHEAD_FACTOR) === 0.1)
       }
     
    -  test("Resolve driver labels, annotations, secret mount paths, and envs.") {
    +  test("Creating driver conf with a python primary file") {
    +    val mainResourceFile = "local:///opt/spark/main.py"
    +    val inputPyFiles = Array("local:///opt/spark/example2.py", "local:///example3.py")
         val sparkConf = new SparkConf(false)
    +      .setJars(Seq("local:///opt/spark/jar1.jar"))
    +      .set("spark.files", "local:///opt/spark/example4.py")
    +    val mainAppResource = Some(PythonMainAppResource(mainResourceFile))
    +    val kubernetesConfWithMainResource = KubernetesConf.createDriverConf(
    +      sparkConf,
    +      APP_NAME,
    +      RESOURCE_NAME_PREFIX,
    +      APP_ID,
    +      mainAppResource,
    +      MAIN_CLASS,
    +      APP_ARGS,
    +      Some(inputPyFiles.mkString(",")))
    +    assert(kubernetesConfWithMainResource.sparkConf.get("spark.jars").split(",")
    +      === Array("local:///opt/spark/jar1.jar"))
    +    assert(kubernetesConfWithMainResource.sparkConf.get(MEMORY_OVERHEAD_FACTOR) === 0.4)
    --- End diff --
    
    Just as we discussed earlier testing this value explicitly configured with Python would be good to have as well.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org