You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemds.apache.org by ja...@apache.org on 2021/05/05 17:05:40 UTC

[systemds] branch master updated: Preliminary instruction to run an example spark job

This is an automated email from the ASF dual-hosted git repository.

janardhan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/systemds.git


The following commit(s) were added to refs/heads/master by this push:
     new 947ec2a  Preliminary instruction to run an example spark job
947ec2a is described below

commit 947ec2ab534f664662e94cd719ca3b12ef9f8337
Author: j143 <j1...@protonmail.com>
AuthorDate: Wed May 5 22:18:10 2021 +0530

    Preliminary instruction to run an example spark job
---
 scripts/staging/google-cloud/README.md | 58 ++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/scripts/staging/google-cloud/README.md b/scripts/staging/google-cloud/README.md
new file mode 100644
index 0000000..1890af2
--- /dev/null
+++ b/scripts/staging/google-cloud/README.md
@@ -0,0 +1,58 @@
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+## Create a dataproc cluster
+
+Create a cluster name
+```sh
+CLUSTERNAME=dp-systemds
+```
+
+Set Dataproc cluster region
+```sh
+gcloud config set dataproc/region us-central1
+```
+
+Now, create a new cluster
+
+[`gcloud dataproc clusters create` reference](https://cloud.google.com/sdk/gcloud/reference/dataproc/clusters/create)
+```sh
+gcloud dataproc clusters create ${CLUSTERNAME} \
+  --scopes=cloud-platform \
+  --tags systemds \
+  --zone=us-central1-c \
+  --worker-machine-type n1-standard-2 \
+  --worker-boot-disk-size 500 \
+  --master-machine-type n1-standard-2 \
+  --master-boot-disk-size 500 \
+  --image-version 2.0
+```
+
+## Submit a Spark job to the cluster
+
+Jobs can be submitted via a Cloud Dataproc API
+[`jobs.submit`](https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.jobs/submit) request
+
+Submit an example job using `gcloud` tool from the Cloud Shell command line
+
+```sh
+gcloud dataproc jobs submit spark --cluster ${CLUSTER_NAME} \
+  --class org.apache.spark.examples.SparkPi \
+  --jars file:///usr/lib/spark/examples/jars/spark-examples.jar -- 1000
+```