You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "jon (JIRA)" <ji...@apache.org> on 2016/06/17 22:59:05 UTC

[jira] [Commented] (SPARK-16022) Input size is different when I use 1 or 3 nodes but the shufle size remains +- icual, do you know why?

    [ https://issues.apache.org/jira/browse/SPARK-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337171#comment-15337171 ] 

jon commented on SPARK-16022:
-----------------------------

Hi, thanks for the correction. Where is that users first?

> Input size is different when I use 1 or 3 nodes but the shufle size remains +- icual, do you know why?
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-16022
>                 URL: https://issues.apache.org/jira/browse/SPARK-16022
>             Project: Spark
>          Issue Type: Test
>            Reporter: jon
>
> I run some queries on spark with just one node and then with 3 nodes. And in the spark:4040 UI I see something that I am not understanding.
> For example after executing a query with 3 nodes and check the results in the spark UI, in the "input" tab appears 2,8gb, so spark read 2,8gb from hadoop. The same query on hadoop with just one node in local mode appears 7,3gb, the spark read 7,3GB from hadoop. But this value shouldnt be equal?
> For example the value of shuffle remains +- equal in one node vs 3. Why the input value doesn't stay equal? The same amount of data must be read from the hdfs, so I am not understanding.
> Do you know?
> Single node:
> Input: 7,3 GB
> Shuffle read: 208.1kb
> Shuffle write: 208.1kb
> 3 nodes:
> Input: 2,8 GB
> Shuffle read: 193,3 kb
> Shuffle write; 208.1 kb



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org