You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Cristina Rozee <ro...@gmail.com> on 2016/09/15 11:40:47 UTC

Total Shuffle Read and Write Size of Spark workload

Hello,

I am running a spark application and I would like to know the total amount
of shuffle data (read + write ) so could anyone let me know how to get this
information?

Thank you
Cristina.

Re: Total Shuffle Read and Write Size of Spark workload

Posted by Mike Metzger <mi...@flexiblecreations.com>.
While the SparkListener method is likely all around better, if you just
need this quickly you should be able to do a SSH local port redirection
over putty.  In the putty configuration:

- Go to Connection: SSH: Tunnels
- In the Source port field, enter 4040 (or another unused port on your
machine)
- In the Destination field, enter ipaddress:4040 where ipaddress is the IP
you'd normally access of the spark server.  If it's the same server you're
SSH'ing to it can be 127.0.0.1
- Make sure the "Local" and "Auto" radio buttons are checked and click "Add"
- Go back to the Session section and enter the IP / etc configuration
- If you're going to use this often, enter a name and save the
configuration.  Otherwise click open and login as normal.

Once the session is established, you should be able to open a web browser
to http://localhost:4040 which will redirect over the SSH session to the
remote server.  Note that any link that references a non-accessible IP
address can't be reached (though you can also setup putty / SSH as a proxy
to get around that if needed).

Thanks

Mike


On Mon, Sep 19, 2016 at 4:43 AM, Cristina Rozee <ro...@gmail.com>
wrote:

> I Mich,
>
> I do not have access to UI as I am running jobs on remote system and I can
> access it using putty only so only console or logs files are available to
> me.
>
> Thanks
>
> On Mon, Sep 19, 2016 at 11:36 AM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> Spark UI on port 4040 by default
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 19 September 2016 at 10:34, Cristina Rozee <ro...@gmail.com>
>> wrote:
>>
>>> Could you please explain a little bit?
>>>
>>>
>>> On Sun, Sep 18, 2016 at 10:19 PM, Jacek Laskowski <ja...@japila.pl>
>>> wrote:
>>>
>>>> SparkListener perhaps?
>>>>
>>>> Jacek
>>>>
>>>> On 15 Sep 2016 1:41 p.m., "Cristina Rozee" <ro...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am running a spark application and I would like to know the total
>>>>> amount of shuffle data (read + write ) so could anyone let me know how to
>>>>> get this information?
>>>>>
>>>>> Thank you
>>>>> Cristina.
>>>>>
>>>>
>>>
>>
>

Re: Total Shuffle Read and Write Size of Spark workload

Posted by Cristina Rozee <ro...@gmail.com>.
I Mich,

I do not have access to UI as I am running jobs on remote system and I can
access it using putty only so only console or logs files are available to
me.

Thanks

On Mon, Sep 19, 2016 at 11:36 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com
> wrote:

> Spark UI on port 4040 by default
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 19 September 2016 at 10:34, Cristina Rozee <ro...@gmail.com>
> wrote:
>
>> Could you please explain a little bit?
>>
>>
>> On Sun, Sep 18, 2016 at 10:19 PM, Jacek Laskowski <ja...@japila.pl>
>> wrote:
>>
>>> SparkListener perhaps?
>>>
>>> Jacek
>>>
>>> On 15 Sep 2016 1:41 p.m., "Cristina Rozee" <ro...@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am running a spark application and I would like to know the total
>>>> amount of shuffle data (read + write ) so could anyone let me know how to
>>>> get this information?
>>>>
>>>> Thank you
>>>> Cristina.
>>>>
>>>
>>
>

Re: Total Shuffle Read and Write Size of Spark workload

Posted by Jacek Laskowski <ja...@japila.pl>.
On Mon, Sep 19, 2016 at 11:36 AM, Mich Talebzadeh
<mi...@gmail.com> wrote:
> Spark UI on port 4040 by default

That's exactly *a* SparkListener + web UI :)

Jacek

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Total Shuffle Read and Write Size of Spark workload

Posted by Mich Talebzadeh <mi...@gmail.com>.
Spark UI on port 4040 by default

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 19 September 2016 at 10:34, Cristina Rozee <ro...@gmail.com>
wrote:

> Could you please explain a little bit?
>
>
> On Sun, Sep 18, 2016 at 10:19 PM, Jacek Laskowski <ja...@japila.pl> wrote:
>
>> SparkListener perhaps?
>>
>> Jacek
>>
>> On 15 Sep 2016 1:41 p.m., "Cristina Rozee" <ro...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I am running a spark application and I would like to know the total
>>> amount of shuffle data (read + write ) so could anyone let me know how to
>>> get this information?
>>>
>>> Thank you
>>> Cristina.
>>>
>>
>

Re: Total Shuffle Read and Write Size of Spark workload

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi Cristina,

http://blog.jaceklaskowski.pl/spark-workshop/slides/08_Monitoring_using_SparkListeners.html

http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.scheduler.SparkListener

Let me know if you've got more questions.

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Mon, Sep 19, 2016 at 11:34 AM, Cristina Rozee
<ro...@gmail.com> wrote:
> Could you please explain a little bit?
>
>
> On Sun, Sep 18, 2016 at 10:19 PM, Jacek Laskowski <ja...@japila.pl> wrote:
>>
>> SparkListener perhaps?
>>
>> Jacek
>>
>>
>> On 15 Sep 2016 1:41 p.m., "Cristina Rozee" <ro...@gmail.com>
>> wrote:
>>>
>>> Hello,
>>>
>>> I am running a spark application and I would like to know the total
>>> amount of shuffle data (read + write ) so could anyone let me know how to
>>> get this information?
>>>
>>> Thank you
>>> Cristina.
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Total Shuffle Read and Write Size of Spark workload

Posted by Cristina Rozee <ro...@gmail.com>.
Could you please explain a little bit?

On Sun, Sep 18, 2016 at 10:19 PM, Jacek Laskowski <ja...@japila.pl> wrote:

> SparkListener perhaps?
>
> Jacek
>
> On 15 Sep 2016 1:41 p.m., "Cristina Rozee" <ro...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I am running a spark application and I would like to know the total
>> amount of shuffle data (read + write ) so could anyone let me know how to
>> get this information?
>>
>> Thank you
>> Cristina.
>>
>

Re: Total Shuffle Read and Write Size of Spark workload

Posted by Jacek Laskowski <ja...@japila.pl>.
SparkListener perhaps?

Jacek

On 15 Sep 2016 1:41 p.m., "Cristina Rozee" <ro...@gmail.com> wrote:

> Hello,
>
> I am running a spark application and I would like to know the total amount
> of shuffle data (read + write ) so could anyone let me know how to get this
> information?
>
> Thank you
> Cristina.
>