You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Simon Edelhaus <ed...@gmail.com> on 2015/08/02 00:24:19 UTC

TCP/IP speedup

Hi All!

How important would be a significant performance improvement to TCP/IP
itself, in terms of
overall job performance improvement. Which part would be most significantly
accelerated?
Would it be HDFS?

-- ttfn
Simon Edelhaus
California 2015

Re: TCP/IP speedup

Posted by Steve Loughran <st...@hortonworks.com>.
On 1 Aug 2015, at 18:26, Ruslan Dautkhanov <da...@gmail.com>> wrote:

If your network is bandwidth-bound, you'll see setting jumbo frames (MTU 9000)
may increase bandwidth up to ~20%.

http://docs.hortonworks.com/HDP2Alpha/index.htm#Hardware_Recommendations_for_Hadoop.htm
"Enabling Jumbo Frames across the cluster improves bandwidth"

+1

you can also get better checksums of packets, so that the (very small but non-zero) risk of corrupted network packets drops a bit more.


If Spark workload is not network bandwidth-bound, I can see it'll be a few percent to no improvement.



Put differently: it shouldn't hurt. The shuffle phase is the most network heavy, especially as it can span the entire cluster that backbone bandwidth "bisection bandwidth" can become the bottleneck, and mean that jobs can interfere

scheduling of work close to the HDFS data means that HDFS reads should often be local (the TCP stack gets bypassed entirely), or at least rack-local (sharing the switch, not any backbone)


but there's other things there, as the slide talks about


-stragglers: often a sign of pending HDD failure, as reads are retries. the classic hadoop MR engine detects these, can spin up alternate mappers (if you enable speculation), and will blacklist the node for further work. Sometimes though that straggling is just unbalanced data -some bits of work may be computationally a lot harder, slowing things down.

-contention for work on the nodes. In YARN you request how many "virtual cores" you want (ops get to define the map of virtual to physical), with each node having a finite set of cores

but ...
  -Unless CPU throttling is turned on, competing processes can take up more CPU than they asked for.
  -that virtual:physical core setting may be of

There's also disk IOP contention; two jobs trying to get at the same spindle, even though there are lots of disks on the server. There's not much you can do about that (today).

A key takeaway from that talk, which applies to all work-tuning talks is: get data from your real workloads, There's some good htrace instrumentation in HDFS these days, I haven't looked @ spark's instrumentation to see how they hook up. You can also expect to have some network monitoring (sflow, ...) which you could use to see if the backbone is overloaded. Don't forget the Linux tooling either, iotop &c. There's lots of room to play here -once you've got the data you can see where to focus, then decide how much time to spend trying to tune it.

-steve


--
Ruslan Dautkhanov

On Sat, Aug 1, 2015 at 6:08 PM, Simon Edelhaus <ed...@gmail.com>> wrote:
Hmmmm....

2% huh.


-- ttfn
Simon Edelhaus
California 2015

On Sat, Aug 1, 2015 at 3:45 PM, Mark Hamstra <ma...@clearstorydata.com>> wrote:
https://spark-summit.org/2015/events/making-sense-of-spark-performance/

On Sat, Aug 1, 2015 at 3:24 PM, Simon Edelhaus <ed...@gmail.com>> wrote:
Hi All!

How important would be a significant performance improvement to TCP/IP itself, in terms of
overall job performance improvement. Which part would be most significantly accelerated?
Would it be HDFS?

-- ttfn
Simon Edelhaus
California 2015





Re: TCP/IP speedup

Posted by Ruslan Dautkhanov <da...@gmail.com>.
If your network is bandwidth-bound, you'll see setting jumbo frames (MTU
9000)
may increase bandwidth up to ~20%.

http://docs.hortonworks.com/HDP2Alpha/index.htm#Hardware_Recommendations_for_Hadoop.htm
"Enabling Jumbo Frames across the cluster improves bandwidth"

If Spark workload is not network bandwidth-bound, I can see it'll be a few
percent to no improvement.



-- 
Ruslan Dautkhanov

On Sat, Aug 1, 2015 at 6:08 PM, Simon Edelhaus <ed...@gmail.com> wrote:

> Hmmmm....
>
> 2% huh.
>
>
> -- ttfn
> Simon Edelhaus
> California 2015
>
> On Sat, Aug 1, 2015 at 3:45 PM, Mark Hamstra <ma...@clearstorydata.com>
> wrote:
>
>> https://spark-summit.org/2015/events/making-sense-of-spark-performance/
>>
>> On Sat, Aug 1, 2015 at 3:24 PM, Simon Edelhaus <ed...@gmail.com> wrote:
>>
>>> Hi All!
>>>
>>> How important would be a significant performance improvement to TCP/IP
>>> itself, in terms of
>>> overall job performance improvement. Which part would be most
>>> significantly accelerated?
>>> Would it be HDFS?
>>>
>>> -- ttfn
>>> Simon Edelhaus
>>> California 2015
>>>
>>
>>
>

Re: TCP/IP speedup

Posted by Simon Edelhaus <ed...@gmail.com>.
Hmmmm....

2% huh.


-- ttfn
Simon Edelhaus
California 2015

On Sat, Aug 1, 2015 at 3:45 PM, Mark Hamstra <ma...@clearstorydata.com>
wrote:

> https://spark-summit.org/2015/events/making-sense-of-spark-performance/
>
> On Sat, Aug 1, 2015 at 3:24 PM, Simon Edelhaus <ed...@gmail.com> wrote:
>
>> Hi All!
>>
>> How important would be a significant performance improvement to TCP/IP
>> itself, in terms of
>> overall job performance improvement. Which part would be most
>> significantly accelerated?
>> Would it be HDFS?
>>
>> -- ttfn
>> Simon Edelhaus
>> California 2015
>>
>
>

Re: TCP/IP speedup

Posted by Michael Segel <ms...@hotmail.com>.
This may seem like a silly question… but in following Mark’s link, the presentation talks about the TPC-DS benchmark. 

Here’s my question… what benchmark results? 

If you go over to the TPC.org <http://tpc.org/> website they have no TPC-DS benchmarks listed. 
(Either audited or unaudited) 

So what gives? 

Note: There are TPCx-HS benchmarks listed… 

Thx

-Mike

> On Aug 1, 2015, at 5:45 PM, Mark Hamstra <ma...@clearstorydata.com> wrote:
> 
> https://spark-summit.org/2015/events/making-sense-of-spark-performance/ <https://spark-summit.org/2015/events/making-sense-of-spark-performance/>
> 
> On Sat, Aug 1, 2015 at 3:24 PM, Simon Edelhaus <edelhas@gmail.com <ma...@gmail.com>> wrote:
> Hi All!
> 
> How important would be a significant performance improvement to TCP/IP itself, in terms of 
> overall job performance improvement. Which part would be most significantly accelerated? 
> Would it be HDFS?
> 
> -- ttfn
> Simon Edelhaus
> California 2015
> 



Re: TCP/IP speedup

Posted by Mark Hamstra <ma...@clearstorydata.com>.
https://spark-summit.org/2015/events/making-sense-of-spark-performance/

On Sat, Aug 1, 2015 at 3:24 PM, Simon Edelhaus <ed...@gmail.com> wrote:

> Hi All!
>
> How important would be a significant performance improvement to TCP/IP
> itself, in terms of
> overall job performance improvement. Which part would be most
> significantly accelerated?
> Would it be HDFS?
>
> -- ttfn
> Simon Edelhaus
> California 2015
>