You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "galacticgumshoe (via GitHub)" <gi...@apache.org> on 2023/07/25 19:59:41 UTC

[GitHub] [spark-docker] galacticgumshoe opened a new pull request, #52: Add Support for Scala 2.13 in Spark 3.4.1

galacticgumshoe opened a new pull request, #52:
URL: https://github.com/apache/spark-docker/pull/52

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
   -->
   
   ### What changes were proposed in this pull request? 
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If there is design documentation, please add the link.
     2. If there is a discussion in the mailing list, please add the link.
   -->
   Add a Scala 2.13 Dockerfile to the Spark 3.4.1 sub-folder for building and tagging `3.4.1-scala2.13-java11-ubuntu`, `3.4.1-scala2.13`, `scala2.13` image.
   
   ### Why are the changes needed? 
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you fix a bug, you can clarify why it is a bug.
   -->
   To provide support for projects using Scala 2.13 and Spark 3.4.1.
   
   ### Does this PR introduce _any_ user-facing change? 
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   Yes, this would add new Docker tags: `3.4.1-scala2.13-java11-ubuntu`, `3.4.1-scala2.13`, `scala2.13`.
   
   ### How was this patch tested? 
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   1. Ran `./add-dockerfiles.sh 3.4.1` in my local
   2. Once I verified the appropriate directory, Dockerfile, and entrypoint.sh were generated in `3.4.1/scala2.13-java11-ubuntu` I then built an image in my local: `docker build --platform linux/amd64 -t <image>:<tag> 3.4.1/scala2.13-java11-ubuntu/`
   3. Verified the correct binary was downloaded in the 2.13 image: https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-bin-hadoop3-scala2.13.tgz.
   4. Ran spark-shell and verified versions (2.13): `docker run -it <image>:<tag> /opt/spark/bin/spark-shell`
   > Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /___/ .__/\_,_/_/ /_/\_\   version 3.4.1
         /_/
   Using Scala version 2.13.8 (OpenJDK 64-Bit Server VM, Java 11.0.19)
   5. I also built the scala2.12 image in my local: `docker build --platform linux/amd64 -t <image>:<tag> 3.4.1/scala2.12-java11-ubuntu/`.
   7. Verified the correct binary was downloaded in the 2.12 image: https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-bin-hadoop3.tgz
   8. Ran spark-shell and verified versions (2.12): `docker run -it <image>:<tag> /opt/spark/bin/spark-shell`
   >Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /___/ .__/\_,_/_/ /_/\_\   version 3.4.1
         /_/
   Using Scala version 2.12.17 (OpenJDK 64-Bit Server VM, Java 11.0.19)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on a diff in pull request #52: Add Support for Scala 2.13 in Spark 3.4.1

Posted by "Yikun (via GitHub)" <gi...@apache.org>.
Yikun commented on code in PR #52:
URL: https://github.com/apache/spark-docker/pull/52#discussion_r1274663861


##########
3.4.1/scala2.12-java11-ubuntu/Dockerfile:
##########
@@ -36,15 +36,16 @@ RUN set -ex; \
 
 # Install Apache Spark
 # https://downloads.apache.org/spark/KEYS
-ENV SPARK_TGZ_URL=https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-bin-hadoop3.tgz \
-    SPARK_TGZ_ASC_URL=https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-bin-hadoop3.tgz.asc \
+ENV SPARK_TGZ_URL=https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-bin- \
+    SPARK_TGZ_ASC_URL=https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-bin- \
     GPG_KEY=F28C9C925C188C35E345614DEDA00CE834F0FC5C
 
 RUN set -ex; \
+    if [ "2.12" = "2.13" ]; then export BIN_FILE_SUFFIX="hadoop3-scala2.13.tgz"; else export BIN_FILE_SUFFIX="hadoop3.tgz"; fi; \

Review Comment:
   As I mentioned in `Dockerfile.template`, if we address suffix concat in template.py then this dockerfile will not be changed.



##########
versions.json:
##########
@@ -22,8 +22,16 @@
       "path": "3.4.1/scala2.12-java11-ubuntu",
       "tags": [
         "3.4.1-scala2.12-java11-ubuntu",
-        "3.4.1-scala",
-        "scala"
+        "3.4.1-scala2.12",
+        "scala2.12"

Review Comment:
   ```suggestion
           "scala2.12",
           "3.4.1-scala",
           "scala"
   ```
   
   I believe the `scala`, `3.4.1-scala` tag should also be kept.



##########
add-dockerfiles.sh:
##########
@@ -33,6 +33,7 @@ scala2.12-java11-python3-r-ubuntu
 scala2.12-java11-python3-ubuntu
 scala2.12-java11-r-ubuntu
 scala2.12-java11-ubuntu
+scala2.13-java11-ubuntu

Review Comment:
   You only add the scala one, do you think python3/r/all should also be added in sometime? (Yep, just a question, IMO we can add scala2.13 now, and add others on demand in future)



##########
Dockerfile.template:
##########
@@ -36,15 +36,16 @@ RUN set -ex; \
 
 # Install Apache Spark
 # https://downloads.apache.org/spark/KEYS
-ENV SPARK_TGZ_URL=https://archive.apache.org/dist/spark/spark-{{ SPARK_VERSION }}/spark-{{ SPARK_VERSION }}-bin-hadoop3.tgz \
-    SPARK_TGZ_ASC_URL=https://archive.apache.org/dist/spark/spark-{{ SPARK_VERSION }}/spark-{{ SPARK_VERSION }}-bin-hadoop3.tgz.asc \

Review Comment:
   ```
   ENV SPARK_TGZ_URL={{ SPARK_TGZ_URL }} \
       SPARK_TGZ_ASC_URL= {{ SPARK_TGZ_ASC_URL }} \
   ```
   
   Could we only change the template in here, and address suffix and prefix in https://github.com/apache/spark-docker/blob/master/tools/template.py



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] Add Support for Scala 2.13 in Spark 3.4.1 [spark-docker]

Posted by "galacticgumshoe (via GitHub)" <gi...@apache.org>.
galacticgumshoe commented on code in PR #52:
URL: https://github.com/apache/spark-docker/pull/52#discussion_r1344564474


##########
.github/workflows/build_3.4.1_2.13.yaml:
##########
@@ -0,0 +1,41 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: "Build and Test (3.4.1) for Scala 2.13"
+
+on:
+  pull_request:
+    branches:
+      - 'master'
+    paths:
+      - '3.4.1/**'
+
+jobs:
+  run-build:
+    strategy:
+      matrix:
+        image-type: ["all", "python", "scala", "r"]
+    name: Run
+    secrets: inherit
+    uses: ./.github/workflows/main.yml
+    with:
+      spark: 3.4.1
+      scala: 2.13

Review Comment:
   Should be fixed now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] Add Support for Scala 2.13 in Spark 3.4.1 [spark-docker]

Posted by "galacticgumshoe (via GitHub)" <gi...@apache.org>.
galacticgumshoe commented on PR #52:
URL: https://github.com/apache/spark-docker/pull/52#issuecomment-1745686684

   @Yikun After syncing my fork with the changes merged to master branch since this PR initiated, I believe this is ready for a review. However, considering that v3.3.3 and 3.5.0 were added recently, would it make sense to include support of Scala 2.13 to these latest versions as well? Or all versions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] Add Support for Scala 2.13 in Spark 3.4.1 [spark-docker]

Posted by "databius (via GitHub)" <gi...@apache.org>.
databius commented on PR #52:
URL: https://github.com/apache/spark-docker/pull/52#issuecomment-1978112303

   It would be great if we could support old versions instead of only spark 3.5+.
   I need an image that supports scala 2.13 and spark 3.4.2.
   Currently, I am building my own image based on this PR. Hopefully the official image will be published soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on pull request #52: Add Support for Scala 2.13 in Spark 3.4.1

Posted by "Yikun (via GitHub)" <gi...@apache.org>.
Yikun commented on PR #52:
URL: https://github.com/apache/spark-docker/pull/52#issuecomment-1680049961

   The CI should be fixed by https://github.com/apache/spark-docker/pull/53 , you can rebase after it merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] Add Support for Scala 2.13 in Spark 3.4.1 [spark-docker]

Posted by "caldempsey (via GitHub)" <gi...@apache.org>.
caldempsey commented on PR #52:
URL: https://github.com/apache/spark-docker/pull/52#issuecomment-1973756934

   Also running into this issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] Add Support for Scala 2.13 in Spark 3.4.1 [spark-docker]

Posted by "holdenk (via GitHub)" <gi...@apache.org>.
holdenk commented on code in PR #52:
URL: https://github.com/apache/spark-docker/pull/52#discussion_r1359026750


##########
versions.json:
##########
@@ -40,18 +40,41 @@
         "3.4.1"
       ]
     },
+    {
+      "path": "3.4.1/scala2.13-java11-python3-ubuntu",
+      "tags": [
+        "3.4.1-scala2.13-java11-python3-ubuntu"
+      ]
+    },
     {
       "path": "3.4.1/scala2.12-java11-r-ubuntu",
       "tags": [
         "3.4.1-scala2.12-java11-r-ubuntu",
         "3.4.1-r"
       ]
     },
+    {
+      "path": "3.4.1/scala2.13-java11-r-ubuntu",
+      "tags": [
+        "3.4.1-scala2.13-java11-r-ubuntu"
+      ]
+    },
     {
       "path": "3.4.1/scala2.12-java11-ubuntu",
       "tags": [
         "3.4.1-scala2.12-java11-ubuntu",
-        "3.4.1-scala"
+        "3.4.1-scala2.12",
+        "scala2.12",
+        "3.4.1-scala",
+        "scala"
+      ]
+    },
+    {
+      "path": "3.4.1/scala2.13-java11-ubuntu",
+      "tags": [
+        "3.4.1-scala2.13-java11-ubuntu",
+        "3.4.1-scala2.13",
+        "scala2.13"

Review Comment:
   Would we want to do the scala 2.13 tag with 3.5 instead  @Yikun 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] Add Support for Scala 2.13 in Spark 3.4.1 [spark-docker]

Posted by "Yikun (via GitHub)" <gi...@apache.org>.
Yikun commented on code in PR #52:
URL: https://github.com/apache/spark-docker/pull/52#discussion_r1359079892


##########
versions.json:
##########
@@ -40,18 +40,41 @@
         "3.4.1"
       ]
     },
+    {
+      "path": "3.4.1/scala2.13-java11-python3-ubuntu",
+      "tags": [
+        "3.4.1-scala2.13-java11-python3-ubuntu"
+      ]
+    },
     {
       "path": "3.4.1/scala2.12-java11-r-ubuntu",
       "tags": [
         "3.4.1-scala2.12-java11-r-ubuntu",
         "3.4.1-r"
       ]
     },
+    {
+      "path": "3.4.1/scala2.13-java11-r-ubuntu",
+      "tags": [
+        "3.4.1-scala2.13-java11-r-ubuntu"
+      ]
+    },
     {
       "path": "3.4.1/scala2.12-java11-ubuntu",
       "tags": [
         "3.4.1-scala2.12-java11-ubuntu",
-        "3.4.1-scala"
+        "3.4.1-scala2.12",
+        "scala2.12",
+        "3.4.1-scala",
+        "scala"
+      ]
+    },
+    {
+      "path": "3.4.1/scala2.13-java11-ubuntu",
+      "tags": [
+        "3.4.1-scala2.13-java11-ubuntu",
+        "3.4.1-scala2.13",
+        "scala2.13"

Review Comment:
   Yes, agree! As I mentioned above comment, 2.13 better to introduce in latest versio (3.5.0 for now)
   
   https://github.com/apache/spark-docker/pull/52#pullrequestreview-1676547189



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] Add Support for Scala 2.13 in Spark 3.4.1 [spark-docker]

Posted by "galacticgumshoe (via GitHub)" <gi...@apache.org>.
galacticgumshoe commented on code in PR #52:
URL: https://github.com/apache/spark-docker/pull/52#discussion_r1344702801


##########
3.4.1/scala2.12-java11-ubuntu/Dockerfile:
##########
@@ -43,8 +43,8 @@ ENV SPARK_TGZ_URL=https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-
 RUN set -ex; \
     export SPARK_TMP="$(mktemp -d)"; \
     cd $SPARK_TMP; \
-    wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \
-    wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \
+    wget -nv -O spark.tgz "${SPARK_TGZ_URL}"; \
+    wget -nv -O spark.tgz.asc "${SPARK_TGZ_ASC_URL}"; \

Review Comment:
   Please note that since I merged in the changes from master in 81c2933 that had merged after this PR was initiated, that pulled in changes also to the entrypoint.sh.template. So after making the change to the Dockerfile.template and rerunning add-dockerfiles.sh, both the Dockerfile and entrypoint.sh for 3.4.1/2.12 has been changed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] galacticgumshoe commented on a diff in pull request #52: Add Support for Scala 2.13 in Spark 3.4.1

Posted by "galacticgumshoe (via GitHub)" <gi...@apache.org>.
galacticgumshoe commented on code in PR #52:
URL: https://github.com/apache/spark-docker/pull/52#discussion_r1277934981


##########
3.4.1/scala2.12-java11-ubuntu/Dockerfile:
##########
@@ -36,15 +36,16 @@ RUN set -ex; \
 
 # Install Apache Spark
 # https://downloads.apache.org/spark/KEYS
-ENV SPARK_TGZ_URL=https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-bin-hadoop3.tgz \
-    SPARK_TGZ_ASC_URL=https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-bin-hadoop3.tgz.asc \
+ENV SPARK_TGZ_URL=https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-bin- \
+    SPARK_TGZ_ASC_URL=https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-bin- \
     GPG_KEY=F28C9C925C188C35E345614DEDA00CE834F0FC5C
 
 RUN set -ex; \
+    if [ "2.12" = "2.13" ]; then export BIN_FILE_SUFFIX="hadoop3-scala2.13.tgz"; else export BIN_FILE_SUFFIX="hadoop3.tgz"; fi; \

Review Comment:
   Agreed.



##########
versions.json:
##########
@@ -22,8 +22,16 @@
       "path": "3.4.1/scala2.12-java11-ubuntu",
       "tags": [
         "3.4.1-scala2.12-java11-ubuntu",
-        "3.4.1-scala",
-        "scala"
+        "3.4.1-scala2.12",
+        "scala2.12"

Review Comment:
   Will fix!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on a diff in pull request #52: Add Support for Scala 2.13 in Spark 3.4.1

Posted by "Yikun (via GitHub)" <gi...@apache.org>.
Yikun commented on code in PR #52:
URL: https://github.com/apache/spark-docker/pull/52#discussion_r1295447530


##########
3.4.1/scala2.12-java11-ubuntu/Dockerfile:
##########
@@ -43,8 +43,8 @@ ENV SPARK_TGZ_URL=https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-
 RUN set -ex; \
     export SPARK_TMP="$(mktemp -d)"; \
     cd $SPARK_TMP; \
-    wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \
-    wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \
+    wget -nv -O spark.tgz "${SPARK_TGZ_URL}"; \
+    wget -nv -O spark.tgz.asc "${SPARK_TGZ_ASC_URL}"; \

Review Comment:
   Is it a required change? otherwise I think we might better to recover.



##########
.github/workflows/build_3.4.1_2.13.yaml:
##########
@@ -0,0 +1,41 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: "Build and Test (3.4.1) for Scala 2.13"
+
+on:
+  pull_request:
+    branches:
+      - 'master'
+    paths:
+      - '3.4.1/**'
+
+jobs:
+  run-build:
+    strategy:
+      matrix:
+        image-type: ["all", "python", "scala", "r"]
+    name: Run
+    secrets: inherit
+    uses: ./.github/workflows/main.yml
+    with:
+      spark: 3.4.1
+      scala: 2.13

Review Comment:
   I think we can move `scala: 2.13` to strategy and squash this file to build_3.4.1.yaml:
   
   ```
       strategy:
         matrix:
           image-type: ["all", "python", "scala", "r"]
           scala: ["2.13", "2.12"]
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] Add Support for Scala 2.13 in Spark 3.4.1 [spark-docker]

Posted by "Yikun (via GitHub)" <gi...@apache.org>.
Yikun commented on code in PR #52:
URL: https://github.com/apache/spark-docker/pull/52#discussion_r1358343497


##########
3.4.1/scala2.12-java11-ubuntu/entrypoint.sh:
##########
@@ -77,6 +77,9 @@ elif ! [ -z "${SPARK_HOME+x}" ]; then
   SPARK_CLASSPATH="$SPARK_HOME/conf:$SPARK_CLASSPATH";
 fi
 
+# SPARK-43540: add current working directory into executor classpath

Review Comment:
   This is 3.5.x series, we might create a `entrypoint.sh.3.4.template`, `Dockerfile.3.4.template`.
   
   We also [change](https://github.com/apache/spark-docker/blob/master/add-dockerfiles.sh#L53) the `add-dockerfiles.sh`, For 3.x.1 version, if `3.x` templates exists, use 3.x template, otherwise use the master entrypoint and Dockerfile directly.
   
   It can be a separate PR.



##########
versions.json:
##########
@@ -40,18 +40,41 @@
         "3.4.1"
       ]
     },
+    {
+      "path": "3.4.1/scala2.13-java11-python3-ubuntu",
+      "tags": [
+        "3.4.1-scala2.13-java11-python3-ubuntu"
+      ]
+    },
     {
       "path": "3.4.1/scala2.12-java11-r-ubuntu",
       "tags": [
         "3.4.1-scala2.12-java11-r-ubuntu",
         "3.4.1-r"
       ]
     },
+    {
+      "path": "3.4.1/scala2.13-java11-r-ubuntu",
+      "tags": [
+        "3.4.1-scala2.13-java11-r-ubuntu"
+      ]
+    },
     {
       "path": "3.4.1/scala2.12-java11-ubuntu",
       "tags": [
         "3.4.1-scala2.12-java11-ubuntu",
-        "3.4.1-scala"
+        "3.4.1-scala2.12",
+        "scala2.12",
+        "3.4.1-scala",
+        "scala"

Review Comment:
   scala should be removed, because it's 3.5.0 now



##########
3.4.1/scala2.12-java11-ubuntu/entrypoint.sh:
##########
@@ -77,6 +77,9 @@ elif ! [ -z "${SPARK_HOME+x}" ]; then
   SPARK_CLASSPATH="$SPARK_HOME/conf:$SPARK_CLASSPATH";
 fi
 
+# SPARK-43540: add current working directory into executor classpath

Review Comment:
   cc @HyukjinKwon @zhengruifeng @dongjoon-hyun 
   
   Or we could just apply master dockerfile changes in 3.4-branches?



##########
.github/workflows/build_3.4.1.yaml:
##########
@@ -24,18 +24,18 @@ on:
     branches:
       - 'master'
     paths:
-      - '3.4.1/**'
+      - '3.4.1/scala2.**'
 
 jobs:
   run-build:
     strategy:
       matrix:
         image-type: ["all", "python", "scala", "r"]
+        scala: ["2.13","2.12"]
     name: Run
     secrets: inherit
     uses: ./.github/workflows/main.yml
     with:
       spark: 3.4.1
-      scala: 2.12

Review Comment:
   This shouldn't be removed, so CI is break (not triggered).
   
   Try
   
   ```bash
   scala: ${{ matrix.scala }}
   ```
   
   [1] https://github.com/apache/spark-docker/actions/runs/6398139681/workflow



##########
3.4.1/scala2.12-java11-ubuntu/entrypoint.sh:
##########
@@ -90,6 +93,7 @@ case "$1" in
     CMD=(
       "$SPARK_HOME/bin/spark-submit"
       --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS"
+      --conf "spark.executorEnv.SPARK_DRIVER_POD_IP=$SPARK_DRIVER_BIND_ADDRESS"

Review Comment:
   ditto



##########
versions.json:
##########
@@ -40,18 +40,41 @@
         "3.4.1"
       ]
     },
+    {
+      "path": "3.4.1/scala2.13-java11-python3-ubuntu",
+      "tags": [
+        "3.4.1-scala2.13-java11-python3-ubuntu"
+      ]
+    },
     {
       "path": "3.4.1/scala2.12-java11-r-ubuntu",
       "tags": [
         "3.4.1-scala2.12-java11-r-ubuntu",
         "3.4.1-r"
       ]
     },
+    {
+      "path": "3.4.1/scala2.13-java11-r-ubuntu",
+      "tags": [
+        "3.4.1-scala2.13-java11-r-ubuntu"
+      ]
+    },
     {
       "path": "3.4.1/scala2.12-java11-ubuntu",
       "tags": [
         "3.4.1-scala2.12-java11-ubuntu",
-        "3.4.1-scala"
+        "3.4.1-scala2.12",
+        "scala2.12",

Review Comment:
   scala2.12 should be removed, because it's 3.5.0 now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] Add Support for Scala 2.13 in Spark 3.4.1 [spark-docker]

Posted by "Yikun (via GitHub)" <gi...@apache.org>.
Yikun commented on code in PR #52:
URL: https://github.com/apache/spark-docker/pull/52#discussion_r1359079892


##########
versions.json:
##########
@@ -40,18 +40,41 @@
         "3.4.1"
       ]
     },
+    {
+      "path": "3.4.1/scala2.13-java11-python3-ubuntu",
+      "tags": [
+        "3.4.1-scala2.13-java11-python3-ubuntu"
+      ]
+    },
     {
       "path": "3.4.1/scala2.12-java11-r-ubuntu",
       "tags": [
         "3.4.1-scala2.12-java11-r-ubuntu",
         "3.4.1-r"
       ]
     },
+    {
+      "path": "3.4.1/scala2.13-java11-r-ubuntu",
+      "tags": [
+        "3.4.1-scala2.13-java11-r-ubuntu"
+      ]
+    },
     {
       "path": "3.4.1/scala2.12-java11-ubuntu",
       "tags": [
         "3.4.1-scala2.12-java11-ubuntu",
-        "3.4.1-scala"
+        "3.4.1-scala2.12",
+        "scala2.12",
+        "3.4.1-scala",
+        "scala"
+      ]
+    },
+    {
+      "path": "3.4.1/scala2.13-java11-ubuntu",
+      "tags": [
+        "3.4.1-scala2.13-java11-ubuntu",
+        "3.4.1-scala2.13",
+        "scala2.13"

Review Comment:
   Yes, agree! As I mentioned above comment, 2.13 better to introduce in latest version (3.5.0 for now).
   
   https://github.com/apache/spark-docker/pull/52#pullrequestreview-1676547189



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] Add Support for Scala 2.13 in Spark 3.4.1 [spark-docker]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #52:
URL: https://github.com/apache/spark-docker/pull/52#discussion_r1359986873


##########
3.4.1/scala2.12-java11-ubuntu/entrypoint.sh:
##########
@@ -77,6 +77,9 @@ elif ! [ -z "${SPARK_HOME+x}" ]; then
   SPARK_CLASSPATH="$SPARK_HOME/conf:$SPARK_CLASSPATH";
 fi
 
+# SPARK-43540: add current working directory into executor classpath

Review Comment:
   I think creating a template is better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] Add Support for Scala 2.13 in Spark 3.4.1 [spark-docker]

Posted by "Philosh (via GitHub)" <gi...@apache.org>.
Philosh commented on PR #52:
URL: https://github.com/apache/spark-docker/pull/52#issuecomment-1890765140

   how do I install pyspark 3.4.1 with scala 2.13? please help. I only get scala 2.12


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] galacticgumshoe commented on a diff in pull request #52: Add Support for Scala 2.13 in Spark 3.4.1

Posted by "galacticgumshoe (via GitHub)" <gi...@apache.org>.
galacticgumshoe commented on code in PR #52:
URL: https://github.com/apache/spark-docker/pull/52#discussion_r1277943116


##########
add-dockerfiles.sh:
##########
@@ -33,6 +33,7 @@ scala2.12-java11-python3-r-ubuntu
 scala2.12-java11-python3-ubuntu
 scala2.12-java11-r-ubuntu
 scala2.12-java11-ubuntu
+scala2.13-java11-ubuntu

Review Comment:
   I didn't see any other bin file that was compiled with Scala 2.13 in the [Apache Spark Archives](https://archive.apache.org/dist/spark/spark-3.4.1/). So I thought to hold off on trying the others. But now that I see they are built on top of the initial image (not from the Dockerfile.template, but from the r-python.template), I can try to add support in this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] Add Support for Scala 2.13 in Spark 3.4.1 [spark-docker]

Posted by "galacticgumshoe (via GitHub)" <gi...@apache.org>.
galacticgumshoe closed pull request #52: Add Support for Scala 2.13 in Spark 3.4.1
URL: https://github.com/apache/spark-docker/pull/52


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] galacticgumshoe commented on a diff in pull request #52: Add Support for Scala 2.13 in Spark 3.4.1

Posted by "galacticgumshoe (via GitHub)" <gi...@apache.org>.
galacticgumshoe commented on code in PR #52:
URL: https://github.com/apache/spark-docker/pull/52#discussion_r1277934463


##########
Dockerfile.template:
##########
@@ -36,15 +36,16 @@ RUN set -ex; \
 
 # Install Apache Spark
 # https://downloads.apache.org/spark/KEYS
-ENV SPARK_TGZ_URL=https://archive.apache.org/dist/spark/spark-{{ SPARK_VERSION }}/spark-{{ SPARK_VERSION }}-bin-hadoop3.tgz \
-    SPARK_TGZ_ASC_URL=https://archive.apache.org/dist/spark/spark-{{ SPARK_VERSION }}/spark-{{ SPARK_VERSION }}-bin-hadoop3.tgz.asc \

Review Comment:
   Makes sense, will update.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] galacticgumshoe commented on pull request #52: Add Support for Scala 2.13 in Spark 3.4.1

Posted by "galacticgumshoe (via GitHub)" <gi...@apache.org>.
galacticgumshoe commented on PR #52:
URL: https://github.com/apache/spark-docker/pull/52#issuecomment-1656185337

   @Yikun There, I think that should address each of your comments. Let me know otherwise. Thanks for taking a look at this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] Add Support for Scala 2.13 in Spark 3.4.1 [spark-docker]

Posted by "galacticgumshoe (via GitHub)" <gi...@apache.org>.
galacticgumshoe commented on code in PR #52:
URL: https://github.com/apache/spark-docker/pull/52#discussion_r1344518133


##########
.github/workflows/build_3.4.1_2.13.yaml:
##########
@@ -0,0 +1,41 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: "Build and Test (3.4.1) for Scala 2.13"
+
+on:
+  pull_request:
+    branches:
+      - 'master'
+    paths:
+      - '3.4.1/**'
+
+jobs:
+  run-build:
+    strategy:
+      matrix:
+        image-type: ["all", "python", "scala", "r"]
+    name: Run
+    secrets: inherit
+    uses: ./.github/workflows/main.yml
+    with:
+      spark: 3.4.1
+      scala: 2.13

Review Comment:
   Oh, ugh, totally fell off my radar. I'll try to work on this again!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] Add Support for Scala 2.13 in Spark 3.4.1 [spark-docker]

Posted by "galacticgumshoe (via GitHub)" <gi...@apache.org>.
galacticgumshoe commented on code in PR #52:
URL: https://github.com/apache/spark-docker/pull/52#discussion_r1344576380


##########
3.4.1/scala2.12-java11-ubuntu/Dockerfile:
##########
@@ -43,8 +43,8 @@ ENV SPARK_TGZ_URL=https://archive.apache.org/dist/spark/spark-3.4.1/spark-3.4.1-
 RUN set -ex; \
     export SPARK_TMP="$(mktemp -d)"; \
     cd $SPARK_TMP; \
-    wget -nv -O spark.tgz "$SPARK_TGZ_URL"; \
-    wget -nv -O spark.tgz.asc "$SPARK_TGZ_ASC_URL"; \
+    wget -nv -O spark.tgz "${SPARK_TGZ_URL}"; \
+    wget -nv -O spark.tgz.asc "${SPARK_TGZ_ASC_URL}"; \

Review Comment:
   So while not technically required, this is a result of the changes in the PR to the Dockerfile.template following running the add-docker files.sh script against 3.4.1. I will back out the brackets in the template and rerun to remove these brackets from the generated Dockerfile here. Just documenting the cause of this change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] Add Support for Scala 2.13 in Spark 3.4.1 [spark-docker]

Posted by "Yikun (via GitHub)" <gi...@apache.org>.
Yikun commented on code in PR #52:
URL: https://github.com/apache/spark-docker/pull/52#discussion_r1358346064


##########
3.4.1/scala2.12-java11-ubuntu/entrypoint.sh:
##########
@@ -77,6 +77,9 @@ elif ! [ -z "${SPARK_HOME+x}" ]; then
   SPARK_CLASSPATH="$SPARK_HOME/conf:$SPARK_CLASSPATH";
 fi
 
+# SPARK-43540: add current working directory into executor classpath

Review Comment:
   cc @HyukjinKwon @zhengruifeng @dongjoon-hyun 
   
   Or we could just apply master dockerfile changes in 3.4 images?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] Add Support for Scala 2.13 in Spark 3.4.1 [spark-docker]

Posted by "Yikun (via GitHub)" <gi...@apache.org>.
Yikun commented on code in PR #52:
URL: https://github.com/apache/spark-docker/pull/52#discussion_r1358343497


##########
3.4.1/scala2.12-java11-ubuntu/entrypoint.sh:
##########
@@ -77,6 +77,9 @@ elif ! [ -z "${SPARK_HOME+x}" ]; then
   SPARK_CLASSPATH="$SPARK_HOME/conf:$SPARK_CLASSPATH";
 fi
 
+# SPARK-43540: add current working directory into executor classpath

Review Comment:
   This is 3.5.x series, we might create a `entrypoint.sh.3.4.template`, `Dockerfile.3.4.template` and we also need to [change](https://github.com/apache/spark-docker/blob/master/add-dockerfiles.sh#L53) the `add-dockerfiles.sh`.
   
   For 3.x version, if `3.x` templates exists, use 3.x template, otherwise use the master entrypoint and Dockerfile directly.
   
   It can be a separate PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] galacticgumshoe commented on pull request #52: Add Support for Scala 2.13 in Spark 3.4.1

Posted by "galacticgumshoe (via GitHub)" <gi...@apache.org>.
galacticgumshoe commented on PR #52:
URL: https://github.com/apache/spark-docker/pull/52#issuecomment-1679573022

   @Yikun Back from vacation, so just catching up again here. It looks like the failure in the workflow stems from this file not found: 
   
   `Exception in thread "main" java.nio.file.NoSuchFileException: /opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar`
   
   I am trying to follow the workflow logic, but some of these dependencies relating to Kubernetes look like they are not controlled by this project. Can you provide any insights on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org