You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by atootoonchian <al...@levyx.com> on 2016/04/20 19:45:48 UTC

Improving system design logging in spark

Current spark logging mechanism can be improved by adding the following
parameters. It will help in understanding system bottlenecks and provide
useful guidelines for Spark application developer to design an optimized
application.

1. Shuffle Read Local Time: Time for a task to read shuffle data from local
storage.
2. Shuffle Read Remote Time: Time for a  task to read shuffle data from
remote node.
3. Distribution processing time between computation, I/O, network: Show
distribution of processing time of each task between computation, reading
data from, and reading data from network.
4. Average I/O bandwidth: Average time of I/O throughput for each task when
it fetches data from disk.
5. Average Network bandwidth: Average network throughput for each task when
it fetches data from remote nodes.




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Improving-system-design-logging-in-spark-tp17291.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Improving system design logging in spark

Posted by Ali Tootoonchian <al...@levyx.com>.

Hi,

My point for #2 is distinguishing between how long does it take for each
task to read a data from disk and transfer it through network to targeted
node. As I know (correct me if I'm wrong) block time to fetch data includes
both reading a data by remote node and transferring it to requested node. If
the block time is bigger than our expectation, from system design, we cannot
identify which component is weakest link, storage or network. 



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Improving-system-design-logging-in-spark-tp17291p17308.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Improving system design logging in spark

Posted by Takeshi Yamamuro <li...@gmail.com>.

Hi,

As for #1 and #2, seems it is hard to catch remote/local fetching time
because they are overlapped with each other: See
`ShuffleBlockFetcherIterator`.
IMO the current metric there (catching block time to fetch data from a
queue) is kind of enough for most of users because remote fetching could be
a bottleneck in case the metric gets worse.
Any benefit to handle respective time, remote and local?

// maropu


On Thu, Apr 21, 2016 at 2:47 AM, Ted Yu <yu...@gmail.com> wrote:

> Interesting.
>
> For #3:
>
> bq. reading data from,
>
> I guess you meant reading from disk.
>
> On Wed, Apr 20, 2016 at 10:45 AM, atootoonchian <al...@levyx.com> wrote:
>
>> Current spark logging mechanism can be improved by adding the following
>> parameters. It will help in understanding system bottlenecks and provide
>> useful guidelines for Spark application developer to design an optimized
>> application.
>>
>> 1. Shuffle Read Local Time: Time for a task to read shuffle data from
>> local
>> storage.
>> 2. Shuffle Read Remote Time: Time for a  task to read shuffle data from
>> remote node.
>> 3. Distribution processing time between computation, I/O, network: Show
>> distribution of processing time of each task between computation, reading
>> data from, and reading data from network.
>> 4. Average I/O bandwidth: Average time of I/O throughput for each task
>> when
>> it fetches data from disk.
>> 5. Average Network bandwidth: Average network throughput for each task
>> when
>> it fetches data from remote nodes.
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/Improving-system-design-logging-in-spark-tp17291.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>
>


-- 
---
Takeshi Yamamuro

Re: Improving system design logging in spark

Posted by Ted Yu <yu...@gmail.com>.

Interesting.

For #3:

bq. reading data from,

I guess you meant reading from disk.

On Wed, Apr 20, 2016 at 10:45 AM, atootoonchian <al...@levyx.com> wrote:

> Current spark logging mechanism can be improved by adding the following
> parameters. It will help in understanding system bottlenecks and provide
> useful guidelines for Spark application developer to design an optimized
> application.
>
> 1. Shuffle Read Local Time: Time for a task to read shuffle data from local
> storage.
> 2. Shuffle Read Remote Time: Time for a  task to read shuffle data from
> remote node.
> 3. Distribution processing time between computation, I/O, network: Show
> distribution of processing time of each task between computation, reading
> data from, and reading data from network.
> 4. Average I/O bandwidth: Average time of I/O throughput for each task when
> it fetches data from disk.
> 5. Average Network bandwidth: Average network throughput for each task when
> it fetches data from remote nodes.
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Improving-system-design-logging-in-spark-tp17291.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>