You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Anahita Talebi <an...@gmail.com> on 2017/02/07 16:33:30 UTC

submit a spark code on google cloud

Hello Friends,

I am trying to run a spark code on multiple machines. To this aim, I submit
a spark code on submit job on google cloud platform.
https://cloud.google.com/dataproc/docs/guides/submit-job

I have created a cluster with 6 nodes. Does anyone know how I can realize
which nodes are participated when I run the code on the cluster?

Thanks a lot,
Anahita

Fwd: submit a spark code on google cloud

Posted by Anahita Talebi <an...@gmail.com>.
Dear friends,

I am trying to understand, when I run the a spark code on cluster, how many
of the cluster's nodes are ivolved.

I followed the command which are descrive in the link below,  to have
access to the cluster:
https://cloud.google.com/dataproc/docs/concepts/cluster-web-interfaces

After that I go to the following link through the google chrome browser:
(the name of my cluster is cluster-1 with 6 nodes)

http://cluster-1-m:4040

The thing that I cannot understand is the value of Memory used is 0B, while
I could run the spark code on the google cloud using the same cluster and I
also got my final results. Logically, some memory had to be used.
So does sombody can give me a hint how I can figure out which nodes were
involved when I run the code on the google cloud?

[image: Inline image 1]


Thanks a lot,
Anahita

On Thu, Feb 9, 2017 at 12:25 PM, Dinko Srkoč <di...@gmail.com> wrote:

> On 9 February 2017 at 11:44, Anahita Talebi <an...@gmail.com>
> wrote:
> > [...]
> > I did all the steps on my local machine.
>
> Apparently not. See below.
>
> >
> > In fact I write
> > /Applications/Google Chrome.app/Contents/MacOS/Google Chrom
> > as the google chrome executable path on my laptop.
> >
> > The thing which is strange for me is that when I write
> /Applications/Google
> > Chrome.app/Contents/MacOS/Google Chrom on the terminal without running
> the
> > first step, the chrome browser opens.
>
> Without running the first step you are still on your machine. The
> problem is in your first step. Read on ...
>
> >
> > And also, one more thing, for the step 1 instead of writing
> > gcloud compute ssh --zone=europe-west1-d --ssh-flag="-D 1080"
> > --ssh-flag="-N" --ssh-flag="-n" cluster-1-m
> >
> > I wrote
> > gcloud compute ssh --zone=europe-west1-d  cluster-1-m
> >
> > I removed the flags. Because with the flags it takes more than 15 hours
> > without finishing. That's why I though to removed the flags.
> >
> > Do you think the problem is because of the missing flags?
>
> Yes. By removing the flags you just ssh-ed to the cluster's master -
> this is not what you want. You should execute the "gcloud compute ..."
> command with all the flags and leave it like that. Then open a *new*
> terminal window and run the Chrome from there. Only after you're done
> inspecting the Spark's web UI should you interrupt the "gcloud ..."
> command.
>
> >
> > I did like that:
> >
> > tsf-484-wpa-4-157:~ atalebi$ gcloud compute ssh --zone=europe-west1-d
> > cluster-1-m
> >
> > The programs included with the Debian GNU/Linux system are free software;
> > the exact distribution terms for each program are described in the
> > individual files in /usr/share/doc/*/copyright.
> >
> > Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
> > permitted by applicable law.
>
> Here you logged into the cluster's master. As said above, this is not
> what you want.
>
> > atalebi@cluster-1-m:~$ /Google\ Chrome.app/Contents/MacOS/Google\ Chrome
> > --proxy-server="socks5://localhost:1080" --host-resolver-rules="MAP *
> > 0.0.0.0 , EXCLUDE localhost"--user-data-dir=/tmp/
> > -bash: /Google Chrome.app/Contents/MacOS/Google Chrome: No such file or
> > directory
> > atalebi@cluster-1-m:~$
>
> ... and then you try to run Chrome on the "cluster-1-m" machine. There
> is no "/Applications" directory, nor Chrome installed there, and your
> command fails.
>
> Cheers,
> Dinko
>
> >
> >
> > Thanks a lot for your help,
> >
> > Anahita
> >
> >
> > On Wed, Feb 8, 2017 at 5:03 PM, Dinko Srkoč <di...@gmail.com>
> wrote:
> >>
> >> On 8 February 2017 at 16:33, Anahita Talebi <an...@gmail.com>
> >> wrote:
> >> > Hi Dinko,
> >> >
> >> > I ran both steps on my local laptop.
> >>
> >> Does it work then? I'm not sure from your answer.
> >>
> >> > But when I tried the second step right after the first step, I get the
> >> > error
> >> >
> >> > Method 1:
> >> >
> >> > atalebi@cluster-1-m:~$ /Applications/Google\
> >> > Chrome.app/Contents/MacOS/Google\ Chrome
> >> > --proxy-server="socks5://localhost:1080" --host-resolver-rules="MAP *
> >> > 0.0.0.0 , EXCLUDE localhost"--user-data-dir=/tmp/
> >> >
> >> > -bash: /Applications/Google Chrome.app/Contents/MacOS/Google Chrome:
> No
> >> > such
> >> > file or directory
> >>
> >> "atalebi@cluster-1-m" would suggest that the command is run on the
> >> master from the "cluster-1" dataproc cluster. There is no
> >> "/Applications" directory there.
> >>
> >> >
> >> > While, if I don't run the first step, and do only the second step, I
> >> > don't
> >> > get any error.
> >>
> >> Regardless of the first step, if you try to run Chrome on the computer
> >> from the cluster it will fail because Chrome is not installed there.
> >>
> >> Anyway, SSH tunneling step needs to be done on the local machine
> >> before you can run Chrome (from the local machine, again). BTW, when
> >> you run `gcloud compute ssh ...` leave it hanging there, don't
> >> interrupt it with, say, ctrl-c, or by closing the terminal, until
> >> you're done with that.
> >>
> >> - Dinko
> >>
> >>
> >> >
> >> > Method 2:
> >> >
> >> > tsf-484-wpa-4-157:~ atalebi$ /Applications/Google\
> >> > Chrome.app/Contents/MacOS/Google\ Chrome
> >> > --proxy-server="socks5://localhost:1080" --host-resolver-rules="MAP *
> >> > 0.0.0.0 , EXCLUDE localhost"--user-data-dir=/tmp/
> >> >
> >> > So that's why I thought in method 1, it tries to find chrome on the
> >> > master
> >> > machine.
> >> >
> >> >
> >> > On Wed, Feb 8, 2017 at 4:28 PM, Dinko Srkoč <di...@gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi Anahita,
> >> >>
> >> >> From what I understood, you tried to run those steps on the cluster's
> >> >> master machine. That machine certainly doesn't have Chrome installed,
> >> >> and it probably isn't a Mac.
> >> >>
> >> >> You should run both steps on your local laptop.
> >> >>
> >> >> Cheers,
> >> >> Dinko
> >> >>
> >> >> On 8 February 2017 at 16:07, Anahita Talebi <
> anahita.t.amiri@gmail.com>
> >> >> wrote:
> >> >> > Hi Dinko,
> >> >> >
> >> >> > thanks a lot for your informative answer.
> >> >> > Actually, I tried what you suggest me but unfortunately, I cannot
> get
> >> >> > the
> >> >> > answer. I did as following (I am working on Mac):
> >> >> >
> >> >> > 1) created an ssh tunnel
> >> >> > in the terminal I have atalebi@cluster-1-m:~$
> >> >> > atalebi is my laptop's username and since after @ it is cluster-1-m
> >> >> > means
> >> >> > that I could connect to the cluster. The name of my cluster is
> >> >> > "cluster-1-m"
> >> >> >
> >> >> > 2) Right after when I try to the web browser I get the error which
> >> >> > says
> >> >> >
> >> >> > atalebi@cluster-1-m:~$ /Applications/Google\
> >> >> > Chrome.app/Contents/MacOS/Google\ Chrome
> >> >> > --proxy-server="socks5://localhost:1080"
> --host-resolver-rules="MAP *
> >> >> > 0.0.0.0 , EXCLUDE localhost"--user-data-dir=/tmp/
> >> >> >
> >> >> > -bash: /Applications/Google Chrome.app/Contents/MacOS/Google
> Chrome:
> >> >> > No
> >> >> > such
> >> >> > file or directory
> >> >> >
> >> >> >
> >> >> > I think that I get this error because I didn't install google chrom
> >> >> > on
> >> >> > cluster.
> >> >> >
> >> >> > 3) So I try to run the above command on my local machine terminal
> >> >> >
> >> >> >
> >> >> > tsf-484-wpa-4-157:~ atalebi$ /Applications/Google\
> >> >> > Chrome.app/Contents/MacOS/Google\ Chrome
> >> >> > --proxy-server="socks5://localhost:1080"
> --host-resolver-rules="MAP *
> >> >> > 0.0.0.0 , EXCLUDE localhost"--user-data-dir=/tmp/
> >> >> >
> >> >> >
> >> >> > In this case the chrome browser will be open but when I wrote
> >> >> >
> >> >> > http://cluster-1-m:4040
> >> >> >
> >> >> > It received a message which says the page is not found.
> >> >> >
> >> >> >
> >> >> > Could you please help me what is the issue?
> >> >> >
> >> >> >
> >> >> > Many thanks in advance,
> >> >> >
> >> >> > Anahita
> >> >> >
> >> >> >
> >> >> > On Tue, Feb 7, 2017 at 10:27 PM, Dinko Srkoč <
> dinko.srkoc@gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> Getting to the Spark web UI when Spark is running on Dataproc is
> not
> >> >> >> that straightforward. Connecting to that web interface is a two
> step
> >> >> >> process:
> >> >> >>
> >> >> >> 1. create an SSH tunnel
> >> >> >> 2. configure the browser to use a SOCKS proxy to connect
> >> >> >>
> >> >> >> The above steps are described here:
> >> >> >>
> >> >> >> https://cloud.google.com/dataproc/docs/concepts/cluster-web-
> interfaces
> >> >> >>
> >> >> >> Once you have your browser configured and running, go to the
> >> >> >> http://<master-node-name>:4040 for the Spark web UI and
> >> >> >> http://<master-node-name>:18080 for Spark's history server.
> >> >> >>
> >> >> >> <master-node-name> is the name of the cluster with "-m" appendage.
> >> >> >> So,
> >> >> >> if the cluster name is "mycluster", master will be called
> >> >> >> "mycluster-m".
> >> >> >>
> >> >> >> Cheers,
> >> >> >> Dinko
> >> >> >>
> >> >> >> On 7 February 2017 at 21:41, Jacek Laskowski <ja...@japila.pl>
> >> >> >> wrote:
> >> >> >> > Hi,
> >> >> >> >
> >> >> >> > I know nothing about Spark in GCP so answering this for a pure
> >> >> >> > Spark.
> >> >> >> >
> >> >> >> > Can you use web UI and Executors tab or a SparkListener?
> >> >> >> >
> >> >> >> > Jacek
> >> >> >> >
> >> >> >> > On 7 Feb 2017 5:33 p.m., "Anahita Talebi"
> >> >> >> > <an...@gmail.com>
> >> >> >> > wrote:
> >> >> >> >
> >> >> >> > Hello Friends,
> >> >> >> >
> >> >> >> > I am trying to run a spark code on multiple machines. To this
> aim,
> >> >> >> > I
> >> >> >> > submit
> >> >> >> > a spark code on submit job on google cloud platform.
> >> >> >> > https://cloud.google.com/dataproc/docs/guides/submit-job
> >> >> >> >
> >> >> >> > I have created a cluster with 6 nodes. Does anyone know how I
> can
> >> >> >> > realize
> >> >> >> > which nodes are participated when I run the code on the cluster?
> >> >> >> >
> >> >> >> > Thanks a lot,
> >> >> >> > Anahita
> >> >> >> >
> >> >> >> >
> >> >> >
> >> >> >
> >> >
> >> >
> >
> >
>

Re: submit a spark code on google cloud

Posted by Dinko Srkoč <di...@gmail.com>.
Getting to the Spark web UI when Spark is running on Dataproc is not
that straightforward. Connecting to that web interface is a two step
process:

1. create an SSH tunnel
2. configure the browser to use a SOCKS proxy to connect

The above steps are described here:
https://cloud.google.com/dataproc/docs/concepts/cluster-web-interfaces

Once you have your browser configured and running, go to the
http://<master-node-name>:4040 for the Spark web UI and
http://<master-node-name>:18080 for Spark's history server.

<master-node-name> is the name of the cluster with "-m" appendage. So,
if the cluster name is "mycluster", master will be called
"mycluster-m".

Cheers,
Dinko

On 7 February 2017 at 21:41, Jacek Laskowski <ja...@japila.pl> wrote:
> Hi,
>
> I know nothing about Spark in GCP so answering this for a pure Spark.
>
> Can you use web UI and Executors tab or a SparkListener?
>
> Jacek
>
> On 7 Feb 2017 5:33 p.m., "Anahita Talebi" <an...@gmail.com> wrote:
>
> Hello Friends,
>
> I am trying to run a spark code on multiple machines. To this aim, I submit
> a spark code on submit job on google cloud platform.
> https://cloud.google.com/dataproc/docs/guides/submit-job
>
> I have created a cluster with 6 nodes. Does anyone know how I can realize
> which nodes are participated when I run the code on the cluster?
>
> Thanks a lot,
> Anahita
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: submit a spark code on google cloud

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,

I know nothing about Spark in GCP so answering this for a pure Spark.

Can you use web UI and Executors tab or a SparkListener?

Jacek

On 7 Feb 2017 5:33 p.m., "Anahita Talebi" <an...@gmail.com> wrote:

Hello Friends,

I am trying to run a spark code on multiple machines. To this aim, I submit
a spark code on submit job on google cloud platform.
https://cloud.google.com/dataproc/docs/guides/submit-job

I have created a cluster with 6 nodes. Does anyone know how I can realize
which nodes are participated when I run the code on the cluster?

Thanks a lot,
Anahita