You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by gusiri <dr...@gmail.com> on 2016/09/22 13:54:09 UTC

Is executor computing time affected by network latency?

Hi,

When I increase the network latency among spark nodes, 

I see compute time (=executor computing time in Spark Web UI) also
increases. 

In the graph attached, left = latency 1ms vs right = latency 500ms.

Is there any communication between worker and driver/master even 'during'
executor computing? or any idea on this result?


<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27779/Screen_Shot_2016-09-21_at_5.png> 





Thank you very much in advance. 

//gusiri




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-executor-computing-time-affected-by-network-latency-tp27779.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Is executor computing time affected by network latency?

Posted by Mark Hamstra <ma...@clearstorydata.com>.
>
> The best network results are achieved when Spark nodes share the same
> hosts as Hadoop or they happen to be on the same subnet.
>

That's only true for those portions of a Spark execution pipeline that are
actually reading from HDFS.  If you're re-using an RDD for which the needed
shuffle files are already available on Executor nodes or are looking at
stages of a Spark SQL query execution later than those reading from HDFS,
then data locality and network utilization concerns don't really have
anything to do with co-location of Executors and HDFS data nodes.

On Fri, Sep 23, 2016 at 1:31 PM, Mich Talebzadeh <mi...@gmail.com>
wrote:

> Does this assume that Spark is running on the same hosts as HDFS? Hence
> does increasing the latency affects the network latency on Hadoop nodes as
> well in your tests?
>
> The best network results are achieved when Spark nodes share the same
> hosts as Hadoop or they happen to be on the same subnet.
>
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 22 September 2016 at 14:54, gusiri <dr...@gmail.com> wrote:
>
>> Hi,
>>
>> When I increase the network latency among spark nodes,
>>
>> I see compute time (=executor computing time in Spark Web UI) also
>> increases.
>>
>> In the graph attached, left = latency 1ms vs right = latency 500ms.
>>
>> Is there any communication between worker and driver/master even 'during'
>> executor computing? or any idea on this result?
>>
>>
>> <http://apache-spark-user-list.1001560.n3.nabble.com/file/
>> n27779/Screen_Shot_2016-09-21_at_5.png>
>>
>>
>>
>>
>>
>> Thank you very much in advance.
>>
>> //gusiri
>>
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/Is-executor-computing-time-affected-
>> by-network-latency-tp27779.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>

Re: Is executor computing time affected by network latency?

Posted by Mich Talebzadeh <mi...@gmail.com>.
Does this assume that Spark is running on the same hosts as HDFS? Hence
does increasing the latency affects the network latency on Hadoop nodes as
well in your tests?

The best network results are achieved when Spark nodes share the same hosts
as Hadoop or they happen to be on the same subnet.


HTH


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 22 September 2016 at 14:54, gusiri <dr...@gmail.com> wrote:

> Hi,
>
> When I increase the network latency among spark nodes,
>
> I see compute time (=executor computing time in Spark Web UI) also
> increases.
>
> In the graph attached, left = latency 1ms vs right = latency 500ms.
>
> Is there any communication between worker and driver/master even 'during'
> executor computing? or any idea on this result?
>
>
> <http://apache-spark-user-list.1001560.n3.nabble.com/
> file/n27779/Screen_Shot_2016-09-21_at_5.png>
>
>
>
>
>
> Thank you very much in advance.
>
> //gusiri
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Is-executor-computing-time-
> affected-by-network-latency-tp27779.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Is executor computing time affected by network latency?

Posted by Peter Figliozzi <pe...@gmail.com>.
See the reference on shuffles
<http://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/programming-guide.html#shuffle-operations>,
"Spark’s mechanism for re-distributing data so that it’s grouped
differently across partitions. This typically involves copying data across
executors and machines, making the shuffle a complex and costly operation."



On Thu, Sep 22, 2016 at 4:14 PM, Soumitra Johri <
soumitra.siddharth@gmail.com> wrote:

> If your job involves a shuffle then the compute for the entire batch will
> increase with network latency. What would be interesting is to see how much
> time each task/job/stage takes.
>
> On Thu, Sep 22, 2016 at 5:11 PM Peter Figliozzi <pe...@gmail.com>
> wrote:
>
>> It seems to me they must communicate for joins, sorts, grouping, and so
>> forth, where the original data partitioning needs to change.  You could
>> repeat your experiment for different code snippets.  I'll bet it depends on
>> what you do.
>>
>> On Thu, Sep 22, 2016 at 8:54 AM, gusiri <dr...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> When I increase the network latency among spark nodes,
>>>
>>> I see compute time (=executor computing time in Spark Web UI) also
>>> increases.
>>>
>>> In the graph attached, left = latency 1ms vs right = latency 500ms.
>>>
>>> Is there any communication between worker and driver/master even 'during'
>>> executor computing? or any idea on this result?
>>>
>>>
>>> <http://apache-spark-user-list.1001560.n3.nabble.com/
>>> file/n27779/Screen_Shot_2016-09-21_at_5.png>
>>>
>>>
>>>
>>>
>>>
>>> Thank you very much in advance.
>>>
>>> //gusiri
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-spark-user-list.
>>> 1001560.n3.nabble.com/Is-executor-computing-time-
>>> affected-by-network-latency-tp27779.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>
>>

Re: Is executor computing time affected by network latency?

Posted by Soumitra Johri <so...@gmail.com>.
If your job involves a shuffle then the compute for the entire batch will
increase with network latency. What would be interesting is to see how much
time each task/job/stage takes.
On Thu, Sep 22, 2016 at 5:11 PM Peter Figliozzi <pe...@gmail.com>
wrote:

> It seems to me they must communicate for joins, sorts, grouping, and so
> forth, where the original data partitioning needs to change.  You could
> repeat your experiment for different code snippets.  I'll bet it depends on
> what you do.
>
> On Thu, Sep 22, 2016 at 8:54 AM, gusiri <dr...@gmail.com> wrote:
>
>> Hi,
>>
>> When I increase the network latency among spark nodes,
>>
>> I see compute time (=executor computing time in Spark Web UI) also
>> increases.
>>
>> In the graph attached, left = latency 1ms vs right = latency 500ms.
>>
>> Is there any communication between worker and driver/master even 'during'
>> executor computing? or any idea on this result?
>>
>>
>> <
>> http://apache-spark-user-list.1001560.n3.nabble.com/file/n27779/Screen_Shot_2016-09-21_at_5.png
>> >
>>
>>
>>
>>
>>
>> Thank you very much in advance.
>>
>> //gusiri
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Is-executor-computing-time-affected-by-network-latency-tp27779.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>

Re: Is executor computing time affected by network latency?

Posted by Peter Figliozzi <pe...@gmail.com>.
It seems to me they must communicate for joins, sorts, grouping, and so
forth, where the original data partitioning needs to change.  You could
repeat your experiment for different code snippets.  I'll bet it depends on
what you do.

On Thu, Sep 22, 2016 at 8:54 AM, gusiri <dr...@gmail.com> wrote:

> Hi,
>
> When I increase the network latency among spark nodes,
>
> I see compute time (=executor computing time in Spark Web UI) also
> increases.
>
> In the graph attached, left = latency 1ms vs right = latency 500ms.
>
> Is there any communication between worker and driver/master even 'during'
> executor computing? or any idea on this result?
>
>
> <http://apache-spark-user-list.1001560.n3.nabble.com/
> file/n27779/Screen_Shot_2016-09-21_at_5.png>
>
>
>
>
>
> Thank you very much in advance.
>
> //gusiri
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Is-executor-computing-time-
> affected-by-network-latency-tp27779.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>