You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/10/18 11:01:15 UTC

[GitHub] [spark-docker] dcoliversun opened a new pull request, #15: [SPARK-40569] Expose default port for spark driver

dcoliversun opened a new pull request, #15:
URL: https://github.com/apache/spark-docker/pull/15

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If there is design documentation, please add the link.
     2. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
Yikun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1001328560


##########
.github/workflows/main.yml:
##########
@@ -60,6 +60,9 @@ jobs:
           - ${{ inputs.java }}
         image_suffix: [python3-ubuntu, ubuntu, r-ubuntu, python3-r-ubuntu]
     steps:
+      - name : Test - Run spark application for standalone cluster on docker

Review Comment:
   I was mean https://github.com/apache/spark-docker/pull/15#discussion_r1000504095 move before `- name: Test - Checkout Spark repository` L166 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] dcoliversun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
dcoliversun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1001288052


##########
testing/testing.sh:
##########
@@ -0,0 +1,158 @@
+#!/bin/bash -e
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+CURL_TIMEOUT=1
+CURL_COOLDOWN=1
+CURL_MAX_TRIES=30
+
+NETWORK_NAME=spark-net-bridge
+
+SUBMIT_CONTAINER_NAME=spark-submit
+MASTER_CONTAINER_NAME=spark-master
+WORKER_CONTAINER_NAME=spark-work
+SPARK_MASTER_PORT=7077
+SPARK_MASTER_WEBUI_CONTAINER_PORT=8080
+SPARK_MASTER_WEBUI_HOST_PORT=8080
+SPARK_WORKER_WEBUI_CONTAINER_PORT=8081
+SPARK_WORKER_WEBUI_HOST_PORT=8081
+
+# Create a new docker bridge network
+function create_network() {
+    docker network create --driver bridge "$NETWORK_NAME" > /dev/null
+}
+
+# Remove docker network
+function remove_network() {
+    docker network remove "$NETWORK_NAME" > /dev/null
+}
+
+# Find and kill any remaining containers attached to the network
+function cleanup() {
+    local containers
+    containers="$(docker ps --quiet --filter network="$NETWORK_NAME")"
+
+    if [ -n "$containers" ]; then
+        echo >&2 -n "==> Killing $(echo -n "$containers" | grep -c '^') orphaned container(s)..."
+        echo "$containers" | xargs docker kill > /dev/null
+        echo >&2 " done."
+    fi
+}
+
+function docker_run() {
+    local container_name="$1"
+    local docker_run_command="$2"
+    local args="$3"
+
+    echo >&2 "===> Starting ${container_name}"
+    if [ "$container_name" = "$MASTER_CONTAINER_NAME" ]; then
+      eval "docker run --rm --detach --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    elif [ "$container_name" = "$WORKER_CONTAINER_NAME" ]; then
+      eval "docker run --rm --detach --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    else
+      eval "docker run --rm --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    fi
+}
+
+function start_spark_master() {
+    docker_run \
+      "$MASTER_CONTAINER_NAME" \
+      "--publish $SPARK_MASTER_WEBUI_HOST_PORT:$SPARK_MASTER_WEBUI_CONTAINER_PORT $1" \
+      "/opt/spark/bin/spark-class org.apache.spark.deploy.master.Master" > /dev/null
+}
+
+function start_spark_worker() {
+    docker_run \
+    "$WORKER_CONTAINER_NAME" \
+    "--publish $SPARK_WORKER_WEBUI_HOST_PORT:$SPARK_WORKER_WEBUI_CONTAINER_PORT $1" \
+    "/opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT" > /dev/null
+}
+
+function wait_container_ready() {
+    local container_name="$1"
+    local host_port="$2"
+    i=0
+    echo >&2 "===> Waiting for ${container_name} to be ready..."
+    while true; do
+        i=$((i+1))
+
+        set +e
+
+        curl \
+          --silent \
+          --max-time "$CURL_TIMEOUT" \
+          localhost:"${host_port}" \
+          > /dev/null
+
+        result=$?
+
+        set -e
+
+        if [ "$result" -eq 0 ]; then
+            break
+        fi
+
+        if [ "$i" -gt "$CURL_MAX_TRIES" ]; then
+            echo >&2 "===> \$CURL_MAX_TRIES exceeded waiting for ${container_name} to be ready"
+            return 1
+        fi
+
+        sleep "$CURL_COOLDOWN"
+    done
+
+    echo >&2 "===> ${container_name} is ready."
+}
+
+function run_spark_pi() {
+    docker_run \
+      "$SUBMIT_CONTAINER_NAME" \
+      "$1" \
+      "/opt/spark/bin/spark-submit --master spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT --class org.apache.spark.examples.SparkPi /opt/spark/examples/jars/spark-examples_${scala_spark_version}.jar 20"
+}
+
+# Run smoke test
+function run_smoke_test() {

Review Comment:
   Rename as `run_smoke_test_in_standalone`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] dcoliversun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
dcoliversun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1001438210


##########
.github/workflows/main.yml:
##########
@@ -60,6 +60,9 @@ jobs:
           - ${{ inputs.java }}
         image_suffix: [python3-ubuntu, ubuntu, r-ubuntu, python3-r-ubuntu]
     steps:
+      - name : Test - Run spark application for standalone cluster on docker

Review Comment:
   Maybe rename `Checkout Spark repository` as `Checkout Spark Docker repository`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
Yikun commented on PR #15:
URL: https://github.com/apache/spark-docker/pull/15#issuecomment-1294391023

   also cc @holdenk @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
Yikun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1001324162


##########
testing/testing.sh:
##########
@@ -0,0 +1,158 @@
+#!/bin/bash -e
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+CURL_TIMEOUT=1
+CURL_COOLDOWN=1
+CURL_MAX_TRIES=30
+
+NETWORK_NAME=spark-net-bridge
+
+SUBMIT_CONTAINER_NAME=spark-submit
+MASTER_CONTAINER_NAME=spark-master
+WORKER_CONTAINER_NAME=spark-work
+SPARK_MASTER_PORT=7077
+SPARK_MASTER_WEBUI_CONTAINER_PORT=8080
+SPARK_MASTER_WEBUI_HOST_PORT=8080
+SPARK_WORKER_WEBUI_CONTAINER_PORT=8081
+SPARK_WORKER_WEBUI_HOST_PORT=8081
+
+# Create a new docker bridge network
+function create_network() {
+    docker network create --driver bridge "$NETWORK_NAME" > /dev/null
+}
+
+# Remove docker network
+function remove_network() {
+    docker network remove "$NETWORK_NAME" > /dev/null
+}
+
+# Find and kill any remaining containers attached to the network
+function cleanup() {
+    local containers
+    containers="$(docker ps --quiet --filter network="$NETWORK_NAME")"
+
+    if [ -n "$containers" ]; then
+        echo >&2 -n "==> Killing $(echo -n "$containers" | grep -c '^') orphaned container(s)..."
+        echo "$containers" | xargs docker kill > /dev/null
+        echo >&2 " done."
+    fi
+}
+
+function docker_run() {
+    local container_name="$1"
+    local docker_run_command="$2"
+    local args="$3"
+
+    echo >&2 "===> Starting ${container_name}"
+    if [ "$container_name" = "$MASTER_CONTAINER_NAME" ]; then
+      eval "docker run --rm --detach --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    elif [ "$container_name" = "$WORKER_CONTAINER_NAME" ]; then
+      eval "docker run --rm --detach --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    else
+      eval "docker run --rm --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    fi
+}
+
+function start_spark_master() {
+    docker_run \
+      "$MASTER_CONTAINER_NAME" \
+      "--publish $SPARK_MASTER_WEBUI_HOST_PORT:$SPARK_MASTER_WEBUI_CONTAINER_PORT $1" \
+      "/opt/spark/bin/spark-class org.apache.spark.deploy.master.Master" > /dev/null
+}
+
+function start_spark_worker() {
+    docker_run \
+    "$WORKER_CONTAINER_NAME" \
+    "--publish $SPARK_WORKER_WEBUI_HOST_PORT:$SPARK_WORKER_WEBUI_CONTAINER_PORT $1" \
+    "/opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT" > /dev/null
+}
+
+function wait_container_ready() {
+    local container_name="$1"
+    local host_port="$2"
+    i=0
+    echo >&2 "===> Waiting for ${container_name} to be ready..."
+    while true; do
+        i=$((i+1))
+
+        set +e
+
+        curl \
+          --silent \
+          --max-time "$CURL_TIMEOUT" \
+          localhost:"${host_port}" \
+          > /dev/null
+
+        result=$?
+
+        set -e
+
+        if [ "$result" -eq 0 ]; then
+            break
+        fi
+
+        if [ "$i" -gt "$CURL_MAX_TRIES" ]; then
+            echo >&2 "===> \$CURL_MAX_TRIES exceeded waiting for ${container_name} to be ready"
+            return 1
+        fi
+
+        sleep "$CURL_COOLDOWN"
+    done
+
+    echo >&2 "===> ${container_name} is ready."
+}
+
+function run_spark_pi() {
+    docker_run \
+      "$SUBMIT_CONTAINER_NAME" \
+      "$1" \
+      "/opt/spark/bin/spark-submit --master spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT --class org.apache.spark.examples.SparkPi /opt/spark/examples/jars/spark-examples_${scala_spark_version}.jar 20"
+}
+
+# Run smoke test
+function run_smoke_test() {
+    local docker_run_command=$1
+
+    create_network
+    cleanup
+
+    start_spark_master "${docker_run_command}"
+    start_spark_worker "${docker_run_command}"
+
+    wait_container_ready "$MASTER_CONTAINER_NAME" "$SPARK_MASTER_WEBUI_HOST_PORT"
+    wait_container_ready "$WORKER_CONTAINER_NAME" "$SPARK_WORKER_WEBUI_HOST_PORT"
+
+    run_spark_pi "${docker_run_command}"

Review Comment:
   Fine to me, but `set -exo errexit` I guess is redundant https://github.com/apache/spark-docker/pull/15#discussion_r1001322284



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] dcoliversun commented on pull request #15: [SPARK-40569] Expose default port for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
dcoliversun commented on PR #15:
URL: https://github.com/apache/spark-docker/pull/15#issuecomment-1283249033

   Thanks for your comment @Yikun 
   It's still draft and let me update it. BTW, we need `CONTRIBUTING.md` to help contributor to improve the project by updating the `Dockerfile.template`. If you agree that, I will submit a new PR to make it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
Yikun commented on PR #15:
URL: https://github.com/apache/spark-docker/pull/15#issuecomment-1292909604

   I guess it's ready?
   
   Probably the only potential problem is that the 7077 feels a bit hardcode. If other people think there is a problem, we can temporarily not add 7077. Only add standalone tests wit specified -p `7077`.
   
   For standalone tests, I think it is very useful. BTW seems it references some from [link](https://github.com/apache/flink-docker/blob/dev-master/testing/testing_lib.sh) and reimplemetened for Spark, according my test also works well.
   
   So, if you think `EXPOSE` port it's a problem, we can specified `-p` first, otherwise all is looks good for me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
Yikun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1000456477


##########
Dockerfile.template:
##########
@@ -95,4 +95,6 @@ RUN chmod g+w /opt/spark/work-dir
 RUN chmod a+x /opt/decom.sh
 RUN chmod a+x /opt/entrypoint.sh
 
+EXPOSE 7077

Review Comment:
   Add some note for this? Like
   
   ```suggestion
   # Expose port for spark master service to listen on
   EXPOSE 7077
   ```



##########
testing/testing.sh:
##########
@@ -0,0 +1,158 @@
+#!/bin/bash -e
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+CURL_TIMEOUT=1
+CURL_COOLDOWN=1
+CURL_MAX_TRIES=30
+
+NETWORK_NAME=spark-net-bridge
+
+SUBMIT_CONTAINER_NAME=spark-submit
+MASTER_CONTAINER_NAME=spark-master
+WORKER_CONTAINER_NAME=spark-work

Review Comment:
   ```suggestion
   WORKER_CONTAINER_NAME=spark-worker
   ```



##########
testing/testing.sh:
##########
@@ -0,0 +1,158 @@
+#!/bin/bash -e
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+CURL_TIMEOUT=1
+CURL_COOLDOWN=1
+CURL_MAX_TRIES=30
+
+NETWORK_NAME=spark-net-bridge
+
+SUBMIT_CONTAINER_NAME=spark-submit
+MASTER_CONTAINER_NAME=spark-master
+WORKER_CONTAINER_NAME=spark-work
+SPARK_MASTER_PORT=7077
+SPARK_MASTER_WEBUI_CONTAINER_PORT=8080
+SPARK_MASTER_WEBUI_HOST_PORT=8080
+SPARK_WORKER_WEBUI_CONTAINER_PORT=8081
+SPARK_WORKER_WEBUI_HOST_PORT=8081
+
+# Create a new docker bridge network
+function create_network() {
+    docker network create --driver bridge "$NETWORK_NAME" > /dev/null
+}
+
+# Remove docker network
+function remove_network() {
+    docker network remove "$NETWORK_NAME" > /dev/null
+}
+
+# Find and kill any remaining containers attached to the network
+function cleanup() {
+    local containers
+    containers="$(docker ps --quiet --filter network="$NETWORK_NAME")"
+
+    if [ -n "$containers" ]; then
+        echo >&2 -n "==> Killing $(echo -n "$containers" | grep -c '^') orphaned container(s)..."
+        echo "$containers" | xargs docker kill > /dev/null
+        echo >&2 " done."
+    fi
+}
+
+function docker_run() {
+    local container_name="$1"
+    local docker_run_command="$2"
+    local args="$3"
+
+    echo >&2 "===> Starting ${container_name}"
+    if [ "$container_name" = "$MASTER_CONTAINER_NAME" ]; then
+      eval "docker run --rm --detach --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    elif [ "$container_name" = "$WORKER_CONTAINER_NAME" ]; then
+      eval "docker run --rm --detach --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    else
+      eval "docker run --rm --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    fi
+}
+
+function start_spark_master() {
+    docker_run \
+      "$MASTER_CONTAINER_NAME" \
+      "--publish $SPARK_MASTER_WEBUI_HOST_PORT:$SPARK_MASTER_WEBUI_CONTAINER_PORT $1" \
+      "/opt/spark/bin/spark-class org.apache.spark.deploy.master.Master" > /dev/null
+}
+
+function start_spark_worker() {
+    docker_run \
+    "$WORKER_CONTAINER_NAME" \
+    "--publish $SPARK_WORKER_WEBUI_HOST_PORT:$SPARK_WORKER_WEBUI_CONTAINER_PORT $1" \
+    "/opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT" > /dev/null
+}
+
+function wait_container_ready() {
+    local container_name="$1"
+    local host_port="$2"
+    i=0
+    echo >&2 "===> Waiting for ${container_name} to be ready..."
+    while true; do
+        i=$((i+1))
+
+        set +e
+
+        curl \
+          --silent \
+          --max-time "$CURL_TIMEOUT" \
+          localhost:"${host_port}" \
+          > /dev/null
+
+        result=$?
+
+        set -e
+
+        if [ "$result" -eq 0 ]; then
+            break
+        fi
+
+        if [ "$i" -gt "$CURL_MAX_TRIES" ]; then
+            echo >&2 "===> \$CURL_MAX_TRIES exceeded waiting for ${container_name} to be ready"
+            return 1
+        fi
+
+        sleep "$CURL_COOLDOWN"
+    done
+
+    echo >&2 "===> ${container_name} is ready."
+}
+
+function run_spark_pi() {
+    docker_run \
+      "$SUBMIT_CONTAINER_NAME" \
+      "$1" \
+      "/opt/spark/bin/spark-submit --master spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT --class org.apache.spark.examples.SparkPi /opt/spark/examples/jars/spark-examples_${scala_spark_version}.jar 20"
+}
+
+# Run smoke test
+function run_smoke_test() {
+    local docker_run_command=$1
+
+    create_network
+    cleanup
+
+    start_spark_master "${docker_run_command}"
+    start_spark_worker "${docker_run_command}"
+
+    wait_container_ready "$MASTER_CONTAINER_NAME" "$SPARK_MASTER_WEBUI_HOST_PORT"
+    wait_container_ready "$WORKER_CONTAINER_NAME" "$SPARK_WORKER_WEBUI_HOST_PORT"
+
+    run_spark_pi "${docker_run_command}"
+
+    cleanup
+    remove_network
+}
+
+# Run a master and work and verify they start up and connect to each other successfully.

Review Comment:
   ```suggestion
   # Run a master and worker and verify they start up and connect to each other successfully.
   ```



##########
.github/workflows/main.yml:
##########
@@ -155,6 +155,9 @@ jobs:
           path: ~/.cache/coursier
           key: build-${{ matrix.spark_version }}-scala${{ matrix.scala_version }}-java${{ matrix.java_version }}-coursier
 
+      - name : Test - Run spark application for standalone cluster on docker

Review Comment:
   Move this test before `Test - Checkout Spark repository`.
   
   It seems we don't need spark code, becasue it's completely docker based



##########
testing/testing.sh:
##########
@@ -0,0 +1,158 @@
+#!/bin/bash -e
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+CURL_TIMEOUT=1
+CURL_COOLDOWN=1
+CURL_MAX_TRIES=30
+
+NETWORK_NAME=spark-net-bridge
+
+SUBMIT_CONTAINER_NAME=spark-submit
+MASTER_CONTAINER_NAME=spark-master
+WORKER_CONTAINER_NAME=spark-work
+SPARK_MASTER_PORT=7077
+SPARK_MASTER_WEBUI_CONTAINER_PORT=8080
+SPARK_MASTER_WEBUI_HOST_PORT=8080
+SPARK_WORKER_WEBUI_CONTAINER_PORT=8081
+SPARK_WORKER_WEBUI_HOST_PORT=8081
+
+# Create a new docker bridge network
+function create_network() {
+    docker network create --driver bridge "$NETWORK_NAME" > /dev/null
+}
+
+# Remove docker network
+function remove_network() {
+    docker network remove "$NETWORK_NAME" > /dev/null
+}
+
+# Find and kill any remaining containers attached to the network
+function cleanup() {
+    local containers
+    containers="$(docker ps --quiet --filter network="$NETWORK_NAME")"
+
+    if [ -n "$containers" ]; then
+        echo >&2 -n "==> Killing $(echo -n "$containers" | grep -c '^') orphaned container(s)..."
+        echo "$containers" | xargs docker kill > /dev/null
+        echo >&2 " done."
+    fi
+}
+
+function docker_run() {
+    local container_name="$1"
+    local docker_run_command="$2"
+    local args="$3"
+
+    echo >&2 "===> Starting ${container_name}"
+    if [ "$container_name" = "$MASTER_CONTAINER_NAME" ]; then
+      eval "docker run --rm --detach --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    elif [ "$container_name" = "$WORKER_CONTAINER_NAME" ]; then
+      eval "docker run --rm --detach --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    else
+      eval "docker run --rm --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    fi
+}
+
+function start_spark_master() {
+    docker_run \
+      "$MASTER_CONTAINER_NAME" \
+      "--publish $SPARK_MASTER_WEBUI_HOST_PORT:$SPARK_MASTER_WEBUI_CONTAINER_PORT $1" \
+      "/opt/spark/bin/spark-class org.apache.spark.deploy.master.Master" > /dev/null
+}
+
+function start_spark_worker() {
+    docker_run \
+    "$WORKER_CONTAINER_NAME" \
+    "--publish $SPARK_WORKER_WEBUI_HOST_PORT:$SPARK_WORKER_WEBUI_CONTAINER_PORT $1" \
+    "/opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT" > /dev/null
+}
+
+function wait_container_ready() {
+    local container_name="$1"
+    local host_port="$2"
+    i=0
+    echo >&2 "===> Waiting for ${container_name} to be ready..."
+    while true; do
+        i=$((i+1))
+
+        set +e
+
+        curl \
+          --silent \
+          --max-time "$CURL_TIMEOUT" \
+          localhost:"${host_port}" \
+          > /dev/null
+
+        result=$?
+
+        set -e
+
+        if [ "$result" -eq 0 ]; then
+            break
+        fi
+
+        if [ "$i" -gt "$CURL_MAX_TRIES" ]; then
+            echo >&2 "===> \$CURL_MAX_TRIES exceeded waiting for ${container_name} to be ready"
+            return 1
+        fi
+
+        sleep "$CURL_COOLDOWN"
+    done
+
+    echo >&2 "===> ${container_name} is ready."
+}
+
+function run_spark_pi() {
+    docker_run \
+      "$SUBMIT_CONTAINER_NAME" \
+      "$1" \
+      "/opt/spark/bin/spark-submit --master spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT --class org.apache.spark.examples.SparkPi /opt/spark/examples/jars/spark-examples_${scala_spark_version}.jar 20"
+}
+
+# Run smoke test
+function run_smoke_test() {

Review Comment:
   nit:
   maybe `run_standalone_test` or `run_smoke_standalone_test`? We will add some more other test besides standalone



##########
.github/workflows/main.yml:
##########
@@ -155,6 +155,9 @@ jobs:
           path: ~/.cache/coursier
           key: build-${{ matrix.spark_version }}-scala${{ matrix.scala_version }}-java${{ matrix.java_version }}-coursier
 
+      - name : Test - Run spark application for standalone cluster on docker
+        run: testing/run_tests.sh ${{ matrix.scala_version }}-${{ matrix.spark_version }}

Review Comment:
   Would mind adding version for `--scala-version` and `--spark-version`, you can reference this:
   
   https://github.com/apache/spark/blob/0643d02e4f03cdadb53efc05af0b6533d22db297/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh#L54
   
   Because we might also prepare adding more test about `pyspark`/ `spark-shell` in future.
   
   



##########
testing/testing.sh:
##########
@@ -0,0 +1,158 @@
+#!/bin/bash -e
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+CURL_TIMEOUT=1
+CURL_COOLDOWN=1
+CURL_MAX_TRIES=30
+
+NETWORK_NAME=spark-net-bridge
+
+SUBMIT_CONTAINER_NAME=spark-submit
+MASTER_CONTAINER_NAME=spark-master
+WORKER_CONTAINER_NAME=spark-work
+SPARK_MASTER_PORT=7077
+SPARK_MASTER_WEBUI_CONTAINER_PORT=8080
+SPARK_MASTER_WEBUI_HOST_PORT=8080
+SPARK_WORKER_WEBUI_CONTAINER_PORT=8081
+SPARK_WORKER_WEBUI_HOST_PORT=8081
+
+# Create a new docker bridge network
+function create_network() {
+    docker network create --driver bridge "$NETWORK_NAME" > /dev/null
+}
+
+# Remove docker network
+function remove_network() {
+    docker network remove "$NETWORK_NAME" > /dev/null

Review Comment:
   nit: it works but better to:
   
   ```suggestion
       docker network rm "$NETWORK_NAME" > /dev/null
   ```
   
   https://docs.docker.com/engine/reference/commandline/network_rm/



##########
testing/testing.sh:
##########
@@ -0,0 +1,158 @@
+#!/bin/bash -e
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+CURL_TIMEOUT=1
+CURL_COOLDOWN=1
+CURL_MAX_TRIES=30
+
+NETWORK_NAME=spark-net-bridge
+
+SUBMIT_CONTAINER_NAME=spark-submit
+MASTER_CONTAINER_NAME=spark-master
+WORKER_CONTAINER_NAME=spark-work
+SPARK_MASTER_PORT=7077
+SPARK_MASTER_WEBUI_CONTAINER_PORT=8080
+SPARK_MASTER_WEBUI_HOST_PORT=8080
+SPARK_WORKER_WEBUI_CONTAINER_PORT=8081
+SPARK_WORKER_WEBUI_HOST_PORT=8081
+
+# Create a new docker bridge network
+function create_network() {
+    docker network create --driver bridge "$NETWORK_NAME" > /dev/null
+}
+
+# Remove docker network
+function remove_network() {
+    docker network remove "$NETWORK_NAME" > /dev/null
+}
+
+# Find and kill any remaining containers attached to the network
+function cleanup() {
+    local containers
+    containers="$(docker ps --quiet --filter network="$NETWORK_NAME")"
+
+    if [ -n "$containers" ]; then
+        echo >&2 -n "==> Killing $(echo -n "$containers" | grep -c '^') orphaned container(s)..."
+        echo "$containers" | xargs docker kill > /dev/null
+        echo >&2 " done."
+    fi
+}
+
+function docker_run() {
+    local container_name="$1"
+    local docker_run_command="$2"
+    local args="$3"
+
+    echo >&2 "===> Starting ${container_name}"
+    if [ "$container_name" = "$MASTER_CONTAINER_NAME" ]; then
+      eval "docker run --rm --detach --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    elif [ "$container_name" = "$WORKER_CONTAINER_NAME" ]; then
+      eval "docker run --rm --detach --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    else
+      eval "docker run --rm --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    fi
+}
+
+function start_spark_master() {
+    docker_run \
+      "$MASTER_CONTAINER_NAME" \
+      "--publish $SPARK_MASTER_WEBUI_HOST_PORT:$SPARK_MASTER_WEBUI_CONTAINER_PORT $1" \
+      "/opt/spark/bin/spark-class org.apache.spark.deploy.master.Master" > /dev/null
+}
+
+function start_spark_worker() {
+    docker_run \
+    "$WORKER_CONTAINER_NAME" \
+    "--publish $SPARK_WORKER_WEBUI_HOST_PORT:$SPARK_WORKER_WEBUI_CONTAINER_PORT $1" \
+    "/opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT" > /dev/null
+}
+
+function wait_container_ready() {
+    local container_name="$1"
+    local host_port="$2"
+    i=0
+    echo >&2 "===> Waiting for ${container_name} to be ready..."
+    while true; do
+        i=$((i+1))
+
+        set +e
+
+        curl \
+          --silent \
+          --max-time "$CURL_TIMEOUT" \
+          localhost:"${host_port}" \
+          > /dev/null
+
+        result=$?
+
+        set -e
+
+        if [ "$result" -eq 0 ]; then
+            break
+        fi
+
+        if [ "$i" -gt "$CURL_MAX_TRIES" ]; then
+            echo >&2 "===> \$CURL_MAX_TRIES exceeded waiting for ${container_name} to be ready"
+            return 1
+        fi
+
+        sleep "$CURL_COOLDOWN"
+    done
+
+    echo >&2 "===> ${container_name} is ready."
+}
+
+function run_spark_pi() {
+    docker_run \
+      "$SUBMIT_CONTAINER_NAME" \
+      "$1" \
+      "/opt/spark/bin/spark-submit --master spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT --class org.apache.spark.examples.SparkPi /opt/spark/examples/jars/spark-examples_${scala_spark_version}.jar 20"
+}
+
+# Run smoke test
+function run_smoke_test() {
+    local docker_run_command=$1
+
+    create_network
+    cleanup
+
+    start_spark_master "${docker_run_command}"
+    start_spark_worker "${docker_run_command}"
+
+    wait_container_ready "$MASTER_CONTAINER_NAME" "$SPARK_MASTER_WEBUI_HOST_PORT"
+    wait_container_ready "$WORKER_CONTAINER_NAME" "$SPARK_WORKER_WEBUI_HOST_PORT"
+
+    run_spark_pi "${docker_run_command}"

Review Comment:
   Do we want to validate the output contains `Pi is roughly 3`?



##########
testing/testing.sh:
##########
@@ -0,0 +1,158 @@
+#!/bin/bash -e
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+CURL_TIMEOUT=1
+CURL_COOLDOWN=1
+CURL_MAX_TRIES=30
+
+NETWORK_NAME=spark-net-bridge
+
+SUBMIT_CONTAINER_NAME=spark-submit
+MASTER_CONTAINER_NAME=spark-master
+WORKER_CONTAINER_NAME=spark-work
+SPARK_MASTER_PORT=7077
+SPARK_MASTER_WEBUI_CONTAINER_PORT=8080
+SPARK_MASTER_WEBUI_HOST_PORT=8080
+SPARK_WORKER_WEBUI_CONTAINER_PORT=8081
+SPARK_WORKER_WEBUI_HOST_PORT=8081
+
+# Create a new docker bridge network
+function create_network() {
+    docker network create --driver bridge "$NETWORK_NAME" > /dev/null
+}
+
+# Remove docker network
+function remove_network() {
+    docker network remove "$NETWORK_NAME" > /dev/null
+}
+
+# Find and kill any remaining containers attached to the network
+function cleanup() {
+    local containers
+    containers="$(docker ps --quiet --filter network="$NETWORK_NAME")"
+
+    if [ -n "$containers" ]; then
+        echo >&2 -n "==> Killing $(echo -n "$containers" | grep -c '^') orphaned container(s)..."
+        echo "$containers" | xargs docker kill > /dev/null
+        echo >&2 " done."
+    fi
+}
+
+function docker_run() {
+    local container_name="$1"
+    local docker_run_command="$2"
+    local args="$3"
+
+    echo >&2 "===> Starting ${container_name}"
+    if [ "$container_name" = "$MASTER_CONTAINER_NAME" ]; then
+      eval "docker run --rm --detach --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    elif [ "$container_name" = "$WORKER_CONTAINER_NAME" ]; then
+      eval "docker run --rm --detach --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"

Review Comment:
   1. Looks like these two branches are same, maybe could merge them into one?
   2. Mind to add note about `--detach`?



##########
testing/testing.sh:
##########
@@ -0,0 +1,158 @@
+#!/bin/bash -e
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+

Review Comment:
   Add some usage notes



##########
testing/testing.sh:
##########
@@ -0,0 +1,158 @@
+#!/bin/bash -e
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+CURL_TIMEOUT=1
+CURL_COOLDOWN=1
+CURL_MAX_TRIES=30
+
+NETWORK_NAME=spark-net-bridge
+
+SUBMIT_CONTAINER_NAME=spark-submit
+MASTER_CONTAINER_NAME=spark-master
+WORKER_CONTAINER_NAME=spark-work
+SPARK_MASTER_PORT=7077
+SPARK_MASTER_WEBUI_CONTAINER_PORT=8080
+SPARK_MASTER_WEBUI_HOST_PORT=8080
+SPARK_WORKER_WEBUI_CONTAINER_PORT=8081
+SPARK_WORKER_WEBUI_HOST_PORT=8081
+
+# Create a new docker bridge network
+function create_network() {
+    docker network create --driver bridge "$NETWORK_NAME" > /dev/null
+}
+
+# Remove docker network
+function remove_network() {
+    docker network remove "$NETWORK_NAME" > /dev/null
+}
+
+# Find and kill any remaining containers attached to the network
+function cleanup() {
+    local containers
+    containers="$(docker ps --quiet --filter network="$NETWORK_NAME")"
+
+    if [ -n "$containers" ]; then
+        echo >&2 -n "==> Killing $(echo -n "$containers" | grep -c '^') orphaned container(s)..."
+        echo "$containers" | xargs docker kill > /dev/null
+        echo >&2 " done."
+    fi
+}
+
+function docker_run() {
+    local container_name="$1"
+    local docker_run_command="$2"
+    local args="$3"
+
+    echo >&2 "===> Starting ${container_name}"
+    if [ "$container_name" = "$MASTER_CONTAINER_NAME" ]; then
+      eval "docker run --rm --detach --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    elif [ "$container_name" = "$WORKER_CONTAINER_NAME" ]; then
+      eval "docker run --rm --detach --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    else
+      eval "docker run --rm --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    fi
+}
+
+function start_spark_master() {
+    docker_run \
+      "$MASTER_CONTAINER_NAME" \
+      "--publish $SPARK_MASTER_WEBUI_HOST_PORT:$SPARK_MASTER_WEBUI_CONTAINER_PORT $1" \
+      "/opt/spark/bin/spark-class org.apache.spark.deploy.master.Master" > /dev/null
+}
+
+function start_spark_worker() {
+    docker_run \
+    "$WORKER_CONTAINER_NAME" \
+    "--publish $SPARK_WORKER_WEBUI_HOST_PORT:$SPARK_WORKER_WEBUI_CONTAINER_PORT $1" \
+    "/opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT" > /dev/null
+}
+
+function wait_container_ready() {
+    local container_name="$1"
+    local host_port="$2"
+    i=0
+    echo >&2 "===> Waiting for ${container_name} to be ready..."
+    while true; do
+        i=$((i+1))
+
+        set +e
+
+        curl \
+          --silent \
+          --max-time "$CURL_TIMEOUT" \
+          localhost:"${host_port}" \
+          > /dev/null
+
+        result=$?
+
+        set -e
+
+        if [ "$result" -eq 0 ]; then
+            break
+        fi
+
+        if [ "$i" -gt "$CURL_MAX_TRIES" ]; then
+            echo >&2 "===> \$CURL_MAX_TRIES exceeded waiting for ${container_name} to be ready"
+            return 1
+        fi
+
+        sleep "$CURL_COOLDOWN"
+    done
+
+    echo >&2 "===> ${container_name} is ready."
+}
+
+function run_spark_pi() {
+    docker_run \
+      "$SUBMIT_CONTAINER_NAME" \
+      "$1" \
+      "/opt/spark/bin/spark-submit --master spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT --class org.apache.spark.examples.SparkPi /opt/spark/examples/jars/spark-examples_${scala_spark_version}.jar 20"
+}
+
+# Run smoke test
+function run_smoke_test() {
+    local docker_run_command=$1
+
+    create_network
+    cleanup
+
+    start_spark_master "${docker_run_command}"
+    start_spark_worker "${docker_run_command}"
+
+    wait_container_ready "$MASTER_CONTAINER_NAME" "$SPARK_MASTER_WEBUI_HOST_PORT"
+    wait_container_ready "$WORKER_CONTAINER_NAME" "$SPARK_WORKER_WEBUI_HOST_PORT"
+
+    run_spark_pi "${docker_run_command}"
+
+    cleanup
+    remove_network
+}
+
+# Run a master and work and verify they start up and connect to each other successfully.
+# And run a Spark Pi to complete smoke test.
+function smoke_test() {
+    local scala_spark_version="$1"
+    local image_url=$TEST_REPO/$IMAGE_NAME:$UNIQUE_IMAGE_TAG
+
+    echo >&2 "===> Smoke test for $image_url"
+    run_smoke_test ""

Review Comment:
   nit:
   ```suggestion
       run_smoke_test
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
Yikun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1001320419


##########
testing/testing.sh:
##########
@@ -0,0 +1,164 @@
+#!/bin/bash -e
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# This test script runs a simple smoke test in standalone cluster:
+# - create docker network
+# - start up a master
+# - start up a worker
+# - wait for the web UI endpoint to return successfully
+# - run a simple smoke test in standalone cluster
+# - clean up test resource
+
+CURL_TIMEOUT=1
+CURL_COOLDOWN=1
+CURL_MAX_TRIES=30
+
+NETWORK_NAME=spark-net-bridge
+
+SUBMIT_CONTAINER_NAME=spark-submit
+MASTER_CONTAINER_NAME=spark-master
+WORKER_CONTAINER_NAME=spark-worker
+SPARK_MASTER_PORT=7077
+SPARK_MASTER_WEBUI_CONTAINER_PORT=8080
+SPARK_MASTER_WEBUI_HOST_PORT=8080
+SPARK_WORKER_WEBUI_CONTAINER_PORT=8081
+SPARK_WORKER_WEBUI_HOST_PORT=8081
+
+# Create a new docker bridge network
+function create_network() {
+    docker network create --driver bridge "$NETWORK_NAME" > /dev/null
+}
+
+# Remove docker network
+function remove_network() {
+    docker network rm "$NETWORK_NAME" > /dev/null
+}
+
+# Find and kill any remaining containers attached to the network
+function cleanup() {
+  local containers
+  containers="$(docker ps --quiet --filter network="$NETWORK_NAME")"
+
+  if [ -n "$containers" ]; then
+    echo >&2 -n "==> Killing $(echo -n "$containers" | grep -c '^') orphaned container(s)..."
+    echo "$containers" | xargs docker kill > /dev/null
+    echo >&2 " done."
+  fi
+}
+
+function docker_run() {
+  local container_name="$1"
+  local docker_run_command="$2"
+  local args="$3"
+
+  echo >&2 "===> Starting ${container_name}"
+  if [ "$container_name" = "$MASTER_CONTAINER_NAME" -o "$container_name" = "$WORKER_CONTAINER_NAME" ]; then
+    # --detach: Run spark-master and spark-worker in background, like spark-daemon.sh behaves
+    eval "docker run --rm --detach --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+  else
+    eval "docker run --rm --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+  fi
+}
+
+function start_spark_master() {
+  docker_run \
+    "$MASTER_CONTAINER_NAME" \
+    "--publish $SPARK_MASTER_WEBUI_HOST_PORT:$SPARK_MASTER_WEBUI_CONTAINER_PORT $1" \
+    "/opt/spark/bin/spark-class org.apache.spark.deploy.master.Master" > /dev/null
+}
+
+function start_spark_worker() {
+  docker_run \
+    "$WORKER_CONTAINER_NAME" \
+    "--publish $SPARK_WORKER_WEBUI_HOST_PORT:$SPARK_WORKER_WEBUI_CONTAINER_PORT $1" \
+    "/opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT" > /dev/null
+}
+
+function wait_container_ready() {
+  local container_name="$1"
+  local host_port="$2"
+  i=0
+  echo >&2 "===> Waiting for ${container_name} to be ready..."
+  while true; do
+    i=$((i+1))
+
+    set +e
+
+    curl \
+      --silent \
+      --max-time "$CURL_TIMEOUT" \
+      localhost:"${host_port}" \
+      > /dev/null
+
+    result=$?
+
+    set -e
+
+    if [ "$result" -eq 0 ]; then
+      break
+    fi
+
+    if [ "$i" -gt "$CURL_MAX_TRIES" ]; then
+      echo >&2 "===> \$CURL_MAX_TRIES exceeded waiting for ${container_name} to be ready"
+      return 1
+    fi
+
+    sleep "$CURL_COOLDOWN"
+  done
+
+  echo >&2 "===> ${container_name} is ready."
+}
+
+function run_spark_pi() {
+  docker_run \
+    "$SUBMIT_CONTAINER_NAME" \
+    "$1" \
+    "/opt/spark/bin/spark-submit --master spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT --class org.apache.spark.examples.SparkPi /opt/spark/examples/jars/spark-examples_$SCALA_VERSION-$SPARK_VERSION.jar 20"
+}
+
+# Run smoke standalone test
+function run_smoke_test_in_standalone() {
+  local docker_run_command=$1
+
+  create_network
+  cleanup
+
+  start_spark_master "${docker_run_command}"
+  start_spark_worker "${docker_run_command}"
+
+  wait_container_ready "$MASTER_CONTAINER_NAME" "$SPARK_MASTER_WEBUI_HOST_PORT"
+  wait_container_ready "$WORKER_CONTAINER_NAME" "$SPARK_WORKER_WEBUI_HOST_PORT"
+
+  run_spark_pi "${docker_run_command}"
+
+  cleanup
+  remove_network
+}
+
+# Run a master and worker and verify they start up and connect to each other successfully.
+# And run a smoke test in standalone cluster.
+function smoke_test() {
+  local image_url=$TEST_REPO/$IMAGE_NAME:$UNIQUE_IMAGE_TAG

Review Comment:
   Do we want to make `image_url` configurable?
   
   Then we can also run `testing.sh image_url --scala-version xxx --java-version yyy` as local test.
   
   And `testing.sh $TEST_REPO/$IMAGE_NAME:$UNIQUE_IMAGE_TAG --scala-version xxx --java-version yyy` in GA



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
Yikun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1001403695


##########
.github/workflows/main.yml:
##########
@@ -60,6 +60,9 @@ jobs:
           - ${{ inputs.java }}
         image_suffix: [python3-ubuntu, ubuntu, r-ubuntu, python3-r-ubuntu]
     steps:
+      - name : Test - Run spark application for standalone cluster on docker

Review Comment:
   `Checkout Spark repository` == `Checkout apache/spark-docker repository`
   `Test - Checkout Spark repository` == `Checkout apache/spark repository`
   
   We need `apache/spark` with the specific version only when we run K8s test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] dcoliversun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
dcoliversun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1001439240


##########
testing/testing.sh:
##########
@@ -0,0 +1,164 @@
+#!/bin/bash -e
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# This test script runs a simple smoke test in standalone cluster:
+# - create docker network
+# - start up a master
+# - start up a worker
+# - wait for the web UI endpoint to return successfully
+# - run a simple smoke test in standalone cluster
+# - clean up test resource
+
+CURL_TIMEOUT=1
+CURL_COOLDOWN=1
+CURL_MAX_TRIES=30
+
+NETWORK_NAME=spark-net-bridge
+
+SUBMIT_CONTAINER_NAME=spark-submit
+MASTER_CONTAINER_NAME=spark-master
+WORKER_CONTAINER_NAME=spark-worker
+SPARK_MASTER_PORT=7077
+SPARK_MASTER_WEBUI_CONTAINER_PORT=8080
+SPARK_MASTER_WEBUI_HOST_PORT=8080
+SPARK_WORKER_WEBUI_CONTAINER_PORT=8081
+SPARK_WORKER_WEBUI_HOST_PORT=8081
+
+# Create a new docker bridge network
+function create_network() {
+    docker network create --driver bridge "$NETWORK_NAME" > /dev/null
+}
+
+# Remove docker network
+function remove_network() {
+    docker network rm "$NETWORK_NAME" > /dev/null
+}
+
+# Find and kill any remaining containers attached to the network
+function cleanup() {
+  local containers
+  containers="$(docker ps --quiet --filter network="$NETWORK_NAME")"
+
+  if [ -n "$containers" ]; then
+    echo >&2 -n "==> Killing $(echo -n "$containers" | grep -c '^') orphaned container(s)..."
+    echo "$containers" | xargs docker kill > /dev/null
+    echo >&2 " done."
+  fi
+}
+
+function docker_run() {
+  local container_name="$1"
+  local docker_run_command="$2"
+  local args="$3"
+
+  echo >&2 "===> Starting ${container_name}"
+  if [ "$container_name" = "$MASTER_CONTAINER_NAME" -o "$container_name" = "$WORKER_CONTAINER_NAME" ]; then
+    # --detach: Run spark-master and spark-worker in background, like spark-daemon.sh behaves
+    eval "docker run --rm --detach --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+  else
+    eval "docker run --rm --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+  fi
+}
+
+function start_spark_master() {
+  docker_run \
+    "$MASTER_CONTAINER_NAME" \
+    "--publish $SPARK_MASTER_WEBUI_HOST_PORT:$SPARK_MASTER_WEBUI_CONTAINER_PORT $1" \
+    "/opt/spark/bin/spark-class org.apache.spark.deploy.master.Master" > /dev/null
+}
+
+function start_spark_worker() {
+  docker_run \
+    "$WORKER_CONTAINER_NAME" \
+    "--publish $SPARK_WORKER_WEBUI_HOST_PORT:$SPARK_WORKER_WEBUI_CONTAINER_PORT $1" \
+    "/opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT" > /dev/null
+}
+
+function wait_container_ready() {
+  local container_name="$1"
+  local host_port="$2"
+  i=0
+  echo >&2 "===> Waiting for ${container_name} to be ready..."
+  while true; do
+    i=$((i+1))
+
+    set +e
+
+    curl \
+      --silent \
+      --max-time "$CURL_TIMEOUT" \
+      localhost:"${host_port}" \
+      > /dev/null
+
+    result=$?
+
+    set -e
+
+    if [ "$result" -eq 0 ]; then
+      break
+    fi
+
+    if [ "$i" -gt "$CURL_MAX_TRIES" ]; then
+      echo >&2 "===> \$CURL_MAX_TRIES exceeded waiting for ${container_name} to be ready"
+      return 1
+    fi
+
+    sleep "$CURL_COOLDOWN"
+  done
+
+  echo >&2 "===> ${container_name} is ready."
+}
+
+function run_spark_pi() {
+  docker_run \
+    "$SUBMIT_CONTAINER_NAME" \
+    "$1" \
+    "/opt/spark/bin/spark-submit --master spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT --class org.apache.spark.examples.SparkPi /opt/spark/examples/jars/spark-examples_$SCALA_VERSION-$SPARK_VERSION.jar 20"
+}
+
+# Run smoke standalone test
+function run_smoke_test_in_standalone() {
+  local docker_run_command=$1
+
+  create_network
+  cleanup
+
+  start_spark_master "${docker_run_command}"
+  start_spark_worker "${docker_run_command}"
+
+  wait_container_ready "$MASTER_CONTAINER_NAME" "$SPARK_MASTER_WEBUI_HOST_PORT"
+  wait_container_ready "$WORKER_CONTAINER_NAME" "$SPARK_WORKER_WEBUI_HOST_PORT"
+
+  run_spark_pi "${docker_run_command}"
+
+  cleanup
+  remove_network
+}
+
+# Run a master and worker and verify they start up and connect to each other successfully.
+# And run a smoke test in standalone cluster.
+function smoke_test() {
+  local image_url=$TEST_REPO/$IMAGE_NAME:$UNIQUE_IMAGE_TAG

Review Comment:
   Good idea. Add configuration `--image-url`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on pull request #15: [SPARK-40569] Expose default port for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
Yikun commented on PR #15:
URL: https://github.com/apache/spark-docker/pull/15#issuecomment-1283259645

   @dcoliversun Yes, definately needed. Like https://github.com/apache/spark/blob/master/CONTRIBUTING.md
   
   BTW, we should also add simple test (such as a script to start spark and submit a test job, curl some results) for `EXPOSE` port.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
Yikun commented on PR #15:
URL: https://github.com/apache/spark-docker/pull/15#issuecomment-1286598701

   cc @HyukjinKwon @zhengruifeng Mind taking a look, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] dcoliversun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
dcoliversun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1001438867


##########
testing/run_tests.sh:
##########
@@ -0,0 +1,49 @@
+#!/bin/bash -e
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+set -exo errexit

Review Comment:
   Thanks. `-x` removed and change `#!/bin/bash -e` to `#!/bin/bash`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] dcoliversun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
dcoliversun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1001287867


##########
testing/testing.sh:
##########
@@ -0,0 +1,158 @@
+#!/bin/bash -e
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+CURL_TIMEOUT=1
+CURL_COOLDOWN=1
+CURL_MAX_TRIES=30
+
+NETWORK_NAME=spark-net-bridge
+
+SUBMIT_CONTAINER_NAME=spark-submit
+MASTER_CONTAINER_NAME=spark-master
+WORKER_CONTAINER_NAME=spark-work
+SPARK_MASTER_PORT=7077
+SPARK_MASTER_WEBUI_CONTAINER_PORT=8080
+SPARK_MASTER_WEBUI_HOST_PORT=8080
+SPARK_WORKER_WEBUI_CONTAINER_PORT=8081
+SPARK_WORKER_WEBUI_HOST_PORT=8081
+
+# Create a new docker bridge network
+function create_network() {
+    docker network create --driver bridge "$NETWORK_NAME" > /dev/null
+}
+
+# Remove docker network
+function remove_network() {
+    docker network remove "$NETWORK_NAME" > /dev/null
+}
+
+# Find and kill any remaining containers attached to the network
+function cleanup() {
+    local containers
+    containers="$(docker ps --quiet --filter network="$NETWORK_NAME")"
+
+    if [ -n "$containers" ]; then
+        echo >&2 -n "==> Killing $(echo -n "$containers" | grep -c '^') orphaned container(s)..."
+        echo "$containers" | xargs docker kill > /dev/null
+        echo >&2 " done."
+    fi
+}
+
+function docker_run() {
+    local container_name="$1"
+    local docker_run_command="$2"
+    local args="$3"
+
+    echo >&2 "===> Starting ${container_name}"
+    if [ "$container_name" = "$MASTER_CONTAINER_NAME" ]; then
+      eval "docker run --rm --detach --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    elif [ "$container_name" = "$WORKER_CONTAINER_NAME" ]; then
+      eval "docker run --rm --detach --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    else
+      eval "docker run --rm --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+    fi
+}
+
+function start_spark_master() {
+    docker_run \
+      "$MASTER_CONTAINER_NAME" \
+      "--publish $SPARK_MASTER_WEBUI_HOST_PORT:$SPARK_MASTER_WEBUI_CONTAINER_PORT $1" \
+      "/opt/spark/bin/spark-class org.apache.spark.deploy.master.Master" > /dev/null
+}
+
+function start_spark_worker() {
+    docker_run \
+    "$WORKER_CONTAINER_NAME" \
+    "--publish $SPARK_WORKER_WEBUI_HOST_PORT:$SPARK_WORKER_WEBUI_CONTAINER_PORT $1" \
+    "/opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT" > /dev/null
+}
+
+function wait_container_ready() {
+    local container_name="$1"
+    local host_port="$2"
+    i=0
+    echo >&2 "===> Waiting for ${container_name} to be ready..."
+    while true; do
+        i=$((i+1))
+
+        set +e
+
+        curl \
+          --silent \
+          --max-time "$CURL_TIMEOUT" \
+          localhost:"${host_port}" \
+          > /dev/null
+
+        result=$?
+
+        set -e
+
+        if [ "$result" -eq 0 ]; then
+            break
+        fi
+
+        if [ "$i" -gt "$CURL_MAX_TRIES" ]; then
+            echo >&2 "===> \$CURL_MAX_TRIES exceeded waiting for ${container_name} to be ready"
+            return 1
+        fi
+
+        sleep "$CURL_COOLDOWN"
+    done
+
+    echo >&2 "===> ${container_name} is ready."
+}
+
+function run_spark_pi() {
+    docker_run \
+      "$SUBMIT_CONTAINER_NAME" \
+      "$1" \
+      "/opt/spark/bin/spark-submit --master spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT --class org.apache.spark.examples.SparkPi /opt/spark/examples/jars/spark-examples_${scala_spark_version}.jar 20"
+}
+
+# Run smoke test
+function run_smoke_test() {
+    local docker_run_command=$1
+
+    create_network
+    cleanup
+
+    start_spark_master "${docker_run_command}"
+    start_spark_worker "${docker_run_command}"
+
+    wait_container_ready "$MASTER_CONTAINER_NAME" "$SPARK_MASTER_WEBUI_HOST_PORT"
+    wait_container_ready "$WORKER_CONTAINER_NAME" "$SPARK_WORKER_WEBUI_HOST_PORT"
+
+    run_spark_pi "${docker_run_command}"

Review Comment:
   I think it's better to validate with exit code. More general. `set -exo errexit` added in `run_tests.sh` first.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] dcoliversun closed pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
dcoliversun closed pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster
URL: https://github.com/apache/spark-docker/pull/15


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
Yikun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1001322284


##########
testing/run_tests.sh:
##########
@@ -0,0 +1,49 @@
+#!/bin/bash -e
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+set -exo errexit

Review Comment:
   `#!/bin/bash -e` already equal to `set -e`.
   
   -e  Exit immediately if a command exits with a non-zero status.
   -o option-name
   -x  Print commands and their arguments as they are executed.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
Yikun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1001440707


##########
.github/workflows/main.yml:
##########
@@ -60,6 +60,9 @@ jobs:
           - ${{ inputs.java }}
         image_suffix: [python3-ubuntu, ubuntu, r-ubuntu, python3-r-ubuntu]
     steps:
+      - name : Test - Run spark application for standalone cluster on docker

Review Comment:
   Yep, feel free to rename it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] dcoliversun commented on pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
dcoliversun commented on PR #15:
URL: https://github.com/apache/spark-docker/pull/15#issuecomment-1284903674

   cc @HyukjinKwon @Yikun 
   It would be good if you have time to review this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
Yikun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1000504095


##########
.github/workflows/main.yml:
##########
@@ -155,6 +155,9 @@ jobs:
           path: ~/.cache/coursier
           key: build-${{ matrix.spark_version }}-scala${{ matrix.scala_version }}-java${{ matrix.java_version }}-coursier
 
+      - name : Test - Run spark application for standalone cluster on docker

Review Comment:
   Move this test before `Test - Checkout Spark repository`.
   
   It seems we don't need local spark code for standalone, becasue it's completely docker based



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
Yikun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1000504095


##########
.github/workflows/main.yml:
##########
@@ -155,6 +155,9 @@ jobs:
           path: ~/.cache/coursier
           key: build-${{ matrix.spark_version }}-scala${{ matrix.scala_version }}-java${{ matrix.java_version }}-coursier
 
+      - name : Test - Run spark application for standalone cluster on docker

Review Comment:
   Move this test before `Test - Checkout Spark repository`.
   
   Because we don't need local spark code for standalone, becasue it's completely docker based



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
Yikun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1001476421


##########
testing/testing.sh:
##########
@@ -0,0 +1,207 @@
+#!/bin/bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# This test script runs a simple smoke test in standalone cluster:
+# - create docker network
+# - start up a master
+# - start up a worker
+# - wait for the web UI endpoint to return successfully
+# - run a simple smoke test in standalone cluster
+# - clean up test resource
+
+CURL_TIMEOUT=1
+CURL_COOLDOWN=1
+CURL_MAX_TRIES=30
+
+NETWORK_NAME=spark-net-bridge
+
+SUBMIT_CONTAINER_NAME=spark-submit
+MASTER_CONTAINER_NAME=spark-master
+WORKER_CONTAINER_NAME=spark-worker
+SPARK_MASTER_PORT=7077
+SPARK_MASTER_WEBUI_CONTAINER_PORT=8080
+SPARK_MASTER_WEBUI_HOST_PORT=8080
+SPARK_WORKER_WEBUI_CONTAINER_PORT=8081
+SPARK_WORKER_WEBUI_HOST_PORT=8081
+
+SCALA_VERSION="2.12"
+SPARK_VERSION="3.3.0"
+IMAGE_URL=
+
+# Create a new docker bridge network
+function create_network() {
+  if [ ! -z $(docker network ls --filter name=^${NETWORK_NAME}$ --format="{{ .Name }}") ]; then
+    # bridge network already exists, need to kill containers attached to the network and remove network
+    cleanup
+    remove_network
+  fi
+  docker network create --driver bridge "$NETWORK_NAME" > /dev/null
+}
+
+# Remove docker network
+function remove_network() {
+    docker network rm "$NETWORK_NAME" > /dev/null
+}
+
+# Find and kill any remaining containers attached to the network
+function cleanup() {
+  local containers
+  containers="$(docker ps --quiet --filter network="$NETWORK_NAME")"
+
+  if [ -n "$containers" ]; then
+    echo >&2 -n "==> Killing $(echo -n "$containers" | grep -c '^') orphaned container(s)..."
+    echo "$containers" | xargs docker kill > /dev/null
+    echo >&2 " done."
+  fi
+}
+
+# Exec docker run command
+function docker_run() {
+  local container_name="$1"
+  local docker_run_command="$2"
+  local args="$3"
+
+  echo >&2 "===> Starting ${container_name}"
+  if [ "$container_name" = "$MASTER_CONTAINER_NAME" -o "$container_name" = "$WORKER_CONTAINER_NAME" ]; then
+    # --detach: Run spark-master and spark-worker in background, like spark-daemon.sh behaves
+    eval "docker run --rm --detach --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $IMAGE_URL ${args}"
+  else
+    eval "docker run --rm --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $IMAGE_URL ${args}"
+  fi
+}
+
+# Start up a spark master
+function start_spark_master() {
+  docker_run \
+    "$MASTER_CONTAINER_NAME" \
+    "--publish $SPARK_MASTER_WEBUI_HOST_PORT:$SPARK_MASTER_WEBUI_CONTAINER_PORT $1" \
+    "/opt/spark/bin/spark-class org.apache.spark.deploy.master.Master" > /dev/null
+}
+
+# Start up a spark worker
+function start_spark_worker() {
+  docker_run \
+    "$WORKER_CONTAINER_NAME" \
+    "--publish $SPARK_WORKER_WEBUI_HOST_PORT:$SPARK_WORKER_WEBUI_CONTAINER_PORT $1" \
+    "/opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT" > /dev/null
+}
+
+# Wait container ready until endpoint returns successfully
+function wait_container_ready() {
+  local container_name="$1"
+  local host_port="$2"
+  i=0
+  echo >&2 "===> Waiting for ${container_name} to be ready..."
+  while true; do
+    i=$((i+1))
+
+    set +e
+
+    curl \
+      --silent \
+      --max-time "$CURL_TIMEOUT" \
+      localhost:"${host_port}" \
+      > /dev/null
+
+    result=$?
+
+    set -e
+
+    if [ "$result" -eq 0 ]; then
+      break
+    fi
+
+    if [ "$i" -gt "$CURL_MAX_TRIES" ]; then
+      echo >&2 "===> \$CURL_MAX_TRIES exceeded waiting for ${container_name} to be ready"
+      return 1

Review Comment:
   If exit here, the docker resource (container and network) will not be cleanup.
   
   But I am OK with it because it will help to debug when failure happened, and the old resource will also be cleanup and recreated when next try (L53-L54).
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
Yikun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1001322284


##########
testing/run_tests.sh:
##########
@@ -0,0 +1,49 @@
+#!/bin/bash -e
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+set -exo errexit

Review Comment:
   `#!/bin/bash -e` already equal to `set -e`.
   
   -e  Exit immediately if a command exits with a non-zero status.
   -o option-name
   -x  Print commands and their arguments as they are executed.
   
   Without `-x` might help reduce log
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
Yikun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1001328560


##########
.github/workflows/main.yml:
##########
@@ -60,6 +60,9 @@ jobs:
           - ${{ inputs.java }}
         image_suffix: [python3-ubuntu, ubuntu, r-ubuntu, python3-r-ubuntu]
     steps:
+      - name : Test - Run spark application for standalone cluster on docker

Review Comment:
   I was mean https://github.com/apache/spark-docker/pull/15#discussion_r1000504095 move before `- name: Test - Checkout Spark repository` L166 , we need to checkout `spark-docker` and build image first. : )



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
Yikun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1001328560


##########
.github/workflows/main.yml:
##########
@@ -60,6 +60,9 @@ jobs:
           - ${{ inputs.java }}
         image_suffix: [python3-ubuntu, ubuntu, r-ubuntu, python3-r-ubuntu]
     steps:
+      - name : Test - Run spark application for standalone cluster on docker

Review Comment:
   I was mean https://github.com/apache/spark-docker/pull/15#discussion_r1000504095 before `- name: Test - Checkout Spark repository` L166 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on pull request #15: [SPARK-40569] Expose default port for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
Yikun commented on PR #15:
URL: https://github.com/apache/spark-docker/pull/15#issuecomment-1282360127

   Thanks!
   
   Try change on template and apply changes. :)
   1. Change the Dockerfile.template.
   2. Run ./add-dockerfiles.sh 3.3.0.
   
   Could you fullfill the PR description with test results about EXPOSE port?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] dcoliversun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
dcoliversun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1001361792


##########
.github/workflows/main.yml:
##########
@@ -60,6 +60,9 @@ jobs:
           - ${{ inputs.java }}
         image_suffix: [python3-ubuntu, ubuntu, r-ubuntu, python3-r-ubuntu]
     steps:
+      - name : Test - Run spark application for standalone cluster on docker

Review Comment:
   Oh, my mistake. Only search `Checkout Spark repository`. What difference between `Checkout Spark repository` and `Test - Checkout Spark repository` ? A little misleading



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] dcoliversun commented on pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
dcoliversun commented on PR #15:
URL: https://github.com/apache/spark-docker/pull/15#issuecomment-1285152926

   @Yikun Agreed.  
   > A quick question, how many ports should be exposed in future?  
   
   I believe we should only expose 7077 because it is used to connection between master and work and other ports which should be published are used to access the application from the host.  
   
   > For common ports, we could write them directly in the dockerfile, and for the less commonly used ones, we can expose them to users through docker -p port:port and doc them, WDYT?  
   
   Yes, I have [SPARK-40570](https://issues.apache.org/jira/browse/SPARK-40570) to do it. 8080(master ui port)/8081(worker ui port)/4040(live app ui port)/6066(external service) should be published and I will write in doc with command `docker run --publish <host_port>:<container_port>`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
Yikun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1001328560


##########
.github/workflows/main.yml:
##########
@@ -60,6 +60,9 @@ jobs:
           - ${{ inputs.java }}
         image_suffix: [python3-ubuntu, ubuntu, r-ubuntu, python3-r-ubuntu]
     steps:
+      - name : Test - Run spark application for standalone cluster on docker

Review Comment:
   I was mean `- name: Test - Checkout Spark repository` L166



##########
.github/workflows/main.yml:
##########
@@ -60,6 +60,9 @@ jobs:
           - ${{ inputs.java }}
         image_suffix: [python3-ubuntu, ubuntu, r-ubuntu, python3-r-ubuntu]
     steps:
+      - name : Test - Run spark application for standalone cluster on docker

Review Comment:
   I was mean before `- name: Test - Checkout Spark repository` L166



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
Yikun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1001320419


##########
testing/testing.sh:
##########
@@ -0,0 +1,164 @@
+#!/bin/bash -e
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# This test script runs a simple smoke test in standalone cluster:
+# - create docker network
+# - start up a master
+# - start up a worker
+# - wait for the web UI endpoint to return successfully
+# - run a simple smoke test in standalone cluster
+# - clean up test resource
+
+CURL_TIMEOUT=1
+CURL_COOLDOWN=1
+CURL_MAX_TRIES=30
+
+NETWORK_NAME=spark-net-bridge
+
+SUBMIT_CONTAINER_NAME=spark-submit
+MASTER_CONTAINER_NAME=spark-master
+WORKER_CONTAINER_NAME=spark-worker
+SPARK_MASTER_PORT=7077
+SPARK_MASTER_WEBUI_CONTAINER_PORT=8080
+SPARK_MASTER_WEBUI_HOST_PORT=8080
+SPARK_WORKER_WEBUI_CONTAINER_PORT=8081
+SPARK_WORKER_WEBUI_HOST_PORT=8081
+
+# Create a new docker bridge network
+function create_network() {
+    docker network create --driver bridge "$NETWORK_NAME" > /dev/null
+}
+
+# Remove docker network
+function remove_network() {
+    docker network rm "$NETWORK_NAME" > /dev/null
+}
+
+# Find and kill any remaining containers attached to the network
+function cleanup() {
+  local containers
+  containers="$(docker ps --quiet --filter network="$NETWORK_NAME")"
+
+  if [ -n "$containers" ]; then
+    echo >&2 -n "==> Killing $(echo -n "$containers" | grep -c '^') orphaned container(s)..."
+    echo "$containers" | xargs docker kill > /dev/null
+    echo >&2 " done."
+  fi
+}
+
+function docker_run() {
+  local container_name="$1"
+  local docker_run_command="$2"
+  local args="$3"
+
+  echo >&2 "===> Starting ${container_name}"
+  if [ "$container_name" = "$MASTER_CONTAINER_NAME" -o "$container_name" = "$WORKER_CONTAINER_NAME" ]; then
+    # --detach: Run spark-master and spark-worker in background, like spark-daemon.sh behaves
+    eval "docker run --rm --detach --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+  else
+    eval "docker run --rm --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+  fi
+}
+
+function start_spark_master() {
+  docker_run \
+    "$MASTER_CONTAINER_NAME" \
+    "--publish $SPARK_MASTER_WEBUI_HOST_PORT:$SPARK_MASTER_WEBUI_CONTAINER_PORT $1" \
+    "/opt/spark/bin/spark-class org.apache.spark.deploy.master.Master" > /dev/null
+}
+
+function start_spark_worker() {
+  docker_run \
+    "$WORKER_CONTAINER_NAME" \
+    "--publish $SPARK_WORKER_WEBUI_HOST_PORT:$SPARK_WORKER_WEBUI_CONTAINER_PORT $1" \
+    "/opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT" > /dev/null
+}
+
+function wait_container_ready() {
+  local container_name="$1"
+  local host_port="$2"
+  i=0
+  echo >&2 "===> Waiting for ${container_name} to be ready..."
+  while true; do
+    i=$((i+1))
+
+    set +e
+
+    curl \
+      --silent \
+      --max-time "$CURL_TIMEOUT" \
+      localhost:"${host_port}" \
+      > /dev/null
+
+    result=$?
+
+    set -e
+
+    if [ "$result" -eq 0 ]; then
+      break
+    fi
+
+    if [ "$i" -gt "$CURL_MAX_TRIES" ]; then
+      echo >&2 "===> \$CURL_MAX_TRIES exceeded waiting for ${container_name} to be ready"
+      return 1
+    fi
+
+    sleep "$CURL_COOLDOWN"
+  done
+
+  echo >&2 "===> ${container_name} is ready."
+}
+
+function run_spark_pi() {
+  docker_run \
+    "$SUBMIT_CONTAINER_NAME" \
+    "$1" \
+    "/opt/spark/bin/spark-submit --master spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT --class org.apache.spark.examples.SparkPi /opt/spark/examples/jars/spark-examples_$SCALA_VERSION-$SPARK_VERSION.jar 20"
+}
+
+# Run smoke standalone test
+function run_smoke_test_in_standalone() {
+  local docker_run_command=$1
+
+  create_network
+  cleanup
+
+  start_spark_master "${docker_run_command}"
+  start_spark_worker "${docker_run_command}"
+
+  wait_container_ready "$MASTER_CONTAINER_NAME" "$SPARK_MASTER_WEBUI_HOST_PORT"
+  wait_container_ready "$WORKER_CONTAINER_NAME" "$SPARK_WORKER_WEBUI_HOST_PORT"
+
+  run_spark_pi "${docker_run_command}"
+
+  cleanup
+  remove_network
+}
+
+# Run a master and worker and verify they start up and connect to each other successfully.
+# And run a smoke test in standalone cluster.
+function smoke_test() {
+  local image_url=$TEST_REPO/$IMAGE_NAME:$UNIQUE_IMAGE_TAG

Review Comment:
   Do we want to make `image_url` configurable?
   
   Then we can also run `testing.sh image_url --scala-version xxx --java-version yyy` as local test.



##########
testing/run_tests.sh:
##########
@@ -0,0 +1,49 @@
+#!/bin/bash -e
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+set -exo errexit
+
+SCALA_VERSION="2.12"
+SPARK_VERSION="3.3.0"
+
+# Parse arguments
+while (( "$#" )); do
+  case $1 in
+    --scala-version)
+      SCALA_VERSION="$2"
+      shift
+      ;;
+    --spark-version)
+      SPARK_VERSION="$2"
+      shift
+      ;;
+    *)
+      echo "Unexpected command line flag $2 $1."
+      exit 1
+      ;;
+  esac
+  shift
+done

Review Comment:
   My mean was add `Parse arguments` into `testing.sh`, and keep run_tests.sh simple.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] Yikun commented on pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
Yikun commented on PR #15:
URL: https://github.com/apache/spark-docker/pull/15#issuecomment-1285130451

   @dcoliversun Thanks, CI is green. I will take a look soon.
   
   A quick question, how many ports should be exposed in future? For common ports, we could write them directly in the dockerfile, and for the less commonly used ones, we can expose them to users through `docker -p port:port` and doc them, WDYT?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] dcoliversun commented on pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
dcoliversun commented on PR #15:
URL: https://github.com/apache/spark-docker/pull/15#issuecomment-1296495910

   I have discussed with @Yikun offline, because the ports required by master and worker are different, we will not unify exposure in `Dockerfile`. Now publish the port in the docker run command instead. I will submit a new PR to do the job, the new PR does not modify the `Dockerfile`, only adds smoke tests in standalone cluster for local and GA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark-docker] dcoliversun commented on a diff in pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

Posted by GitBox <gi...@apache.org>.
dcoliversun commented on code in PR #15:
URL: https://github.com/apache/spark-docker/pull/15#discussion_r1001439240


##########
testing/testing.sh:
##########
@@ -0,0 +1,164 @@
+#!/bin/bash -e
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# This test script runs a simple smoke test in standalone cluster:
+# - create docker network
+# - start up a master
+# - start up a worker
+# - wait for the web UI endpoint to return successfully
+# - run a simple smoke test in standalone cluster
+# - clean up test resource
+
+CURL_TIMEOUT=1
+CURL_COOLDOWN=1
+CURL_MAX_TRIES=30
+
+NETWORK_NAME=spark-net-bridge
+
+SUBMIT_CONTAINER_NAME=spark-submit
+MASTER_CONTAINER_NAME=spark-master
+WORKER_CONTAINER_NAME=spark-worker
+SPARK_MASTER_PORT=7077
+SPARK_MASTER_WEBUI_CONTAINER_PORT=8080
+SPARK_MASTER_WEBUI_HOST_PORT=8080
+SPARK_WORKER_WEBUI_CONTAINER_PORT=8081
+SPARK_WORKER_WEBUI_HOST_PORT=8081
+
+# Create a new docker bridge network
+function create_network() {
+    docker network create --driver bridge "$NETWORK_NAME" > /dev/null
+}
+
+# Remove docker network
+function remove_network() {
+    docker network rm "$NETWORK_NAME" > /dev/null
+}
+
+# Find and kill any remaining containers attached to the network
+function cleanup() {
+  local containers
+  containers="$(docker ps --quiet --filter network="$NETWORK_NAME")"
+
+  if [ -n "$containers" ]; then
+    echo >&2 -n "==> Killing $(echo -n "$containers" | grep -c '^') orphaned container(s)..."
+    echo "$containers" | xargs docker kill > /dev/null
+    echo >&2 " done."
+  fi
+}
+
+function docker_run() {
+  local container_name="$1"
+  local docker_run_command="$2"
+  local args="$3"
+
+  echo >&2 "===> Starting ${container_name}"
+  if [ "$container_name" = "$MASTER_CONTAINER_NAME" -o "$container_name" = "$WORKER_CONTAINER_NAME" ]; then
+    # --detach: Run spark-master and spark-worker in background, like spark-daemon.sh behaves
+    eval "docker run --rm --detach --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+  else
+    eval "docker run --rm --network $NETWORK_NAME --name ${container_name} ${docker_run_command} $image_url ${args}"
+  fi
+}
+
+function start_spark_master() {
+  docker_run \
+    "$MASTER_CONTAINER_NAME" \
+    "--publish $SPARK_MASTER_WEBUI_HOST_PORT:$SPARK_MASTER_WEBUI_CONTAINER_PORT $1" \
+    "/opt/spark/bin/spark-class org.apache.spark.deploy.master.Master" > /dev/null
+}
+
+function start_spark_worker() {
+  docker_run \
+    "$WORKER_CONTAINER_NAME" \
+    "--publish $SPARK_WORKER_WEBUI_HOST_PORT:$SPARK_WORKER_WEBUI_CONTAINER_PORT $1" \
+    "/opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT" > /dev/null
+}
+
+function wait_container_ready() {
+  local container_name="$1"
+  local host_port="$2"
+  i=0
+  echo >&2 "===> Waiting for ${container_name} to be ready..."
+  while true; do
+    i=$((i+1))
+
+    set +e
+
+    curl \
+      --silent \
+      --max-time "$CURL_TIMEOUT" \
+      localhost:"${host_port}" \
+      > /dev/null
+
+    result=$?
+
+    set -e
+
+    if [ "$result" -eq 0 ]; then
+      break
+    fi
+
+    if [ "$i" -gt "$CURL_MAX_TRIES" ]; then
+      echo >&2 "===> \$CURL_MAX_TRIES exceeded waiting for ${container_name} to be ready"
+      return 1
+    fi
+
+    sleep "$CURL_COOLDOWN"
+  done
+
+  echo >&2 "===> ${container_name} is ready."
+}
+
+function run_spark_pi() {
+  docker_run \
+    "$SUBMIT_CONTAINER_NAME" \
+    "$1" \
+    "/opt/spark/bin/spark-submit --master spark://$MASTER_CONTAINER_NAME:$SPARK_MASTER_PORT --class org.apache.spark.examples.SparkPi /opt/spark/examples/jars/spark-examples_$SCALA_VERSION-$SPARK_VERSION.jar 20"
+}
+
+# Run smoke standalone test
+function run_smoke_test_in_standalone() {
+  local docker_run_command=$1
+
+  create_network
+  cleanup
+
+  start_spark_master "${docker_run_command}"
+  start_spark_worker "${docker_run_command}"
+
+  wait_container_ready "$MASTER_CONTAINER_NAME" "$SPARK_MASTER_WEBUI_HOST_PORT"
+  wait_container_ready "$WORKER_CONTAINER_NAME" "$SPARK_WORKER_WEBUI_HOST_PORT"
+
+  run_spark_pi "${docker_run_command}"
+
+  cleanup
+  remove_network
+}
+
+# Run a master and worker and verify they start up and connect to each other successfully.
+# And run a smoke test in standalone cluster.
+function smoke_test() {
+  local image_url=$TEST_REPO/$IMAGE_NAME:$UNIQUE_IMAGE_TAG

Review Comment:
   Good idea. Add configuration `--image-url`. For now, this configuration is need in hard. After spark-docker publish, we could set default value. 



##########
testing/run_tests.sh:
##########
@@ -0,0 +1,49 @@
+#!/bin/bash -e
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+set -exo errexit
+
+SCALA_VERSION="2.12"
+SPARK_VERSION="3.3.0"
+
+# Parse arguments
+while (( "$#" )); do
+  case $1 in
+    --scala-version)
+      SCALA_VERSION="$2"
+      shift
+      ;;
+    --spark-version)
+      SPARK_VERSION="$2"
+      shift
+      ;;
+    *)
+      echo "Unexpected command line flag $2 $1."
+      exit 1
+      ;;
+  esac
+  shift
+done

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org