You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tao Li (JIRA)" <ji...@apache.org> on 2015/12/07 16:22:10 UTC
[jira] [Comment Edited] (SPARK-12179) Spark SQL get different result with the same code

    [ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15045053#comment-15045053 ] 

Tao Li edited comment on SPARK-12179 at 12/7/15 3:21 PM:
---------------------------------------------------------

The query is on a hive table and the hive data is not changing.

I think there are many factor will cause this problem, such as 
1. is there some different in different hadoop node environment ?
2. is there some bugs on spark shuffle ?
3. is there some classpath or jar version problem ?
4. is the hive compatibility problem ? 

I think I can make some breakthrough on "shuffle write" number display on the web ui. Why the shuffle write is different? How to get the shuffle write number? Is there any factor will cause the shuffle write different?

I will work on this case and figure it out. [~srowen] If you have any idea or experience, please let me know. Thank you very much!


was (Author: litao1990):
The query is on a hive table and the hive data is not changing.

I think there are many factor will cause this problem, such as 
1. is there some different in different hadoop node environment ?
2. is there some bugs on spark shuffle ?
3. is there some classpath or jar version problem ?
4. is the hive compatibility problem ? 

I think I can make some breakthrough on "shuffle write" number display on the web ui. Why the shuffle write is different? How to get the shuffle write number? Is there any factor will cause the shuffle write different?

I will work on this cause and figure it out. [~srowen] If you have any idea or experience, please let me know. Thank you very much!

> Spark SQL get different result with the same code
> -------------------------------------------------
>
>                 Key: SPARK-12179
>                 URL: https://issues.apache.org/jira/browse/SPARK-12179
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, 1.5.2, 1.5.3
>         Environment: hadoop version: 2.5.0-cdh5.3.2
> spark version: 1.5.3
> run mode: yarn-client
>            Reporter: Tao Li
>            Priority: Minor
>
> I run the sql in yarn-client mode, but get different result each time.
> As you can see the example, I get the different shuffle write with the same shuffle read in two jobs with the same code.
> Some of my spark app runs well, but some always met this problem. And I met this problem on spark 1.3, 1.4 and 1.5 version.
> Can you give me some suggestions about the possible causes or how do I figure out the problem?
> 1. First Run
> Details for Stage 9 (Attempt 0)
> Total Time Across All Tasks: 5.8 min
> Shuffle Read: 24.4 MB / 205399
> Shuffle Write: 6.8 MB / 54934
> 2. Second Run
> Details for Stage 9 (Attempt 0)
> Total Time Across All Tasks: 5.6 min
> Shuffle Read: 24.4 MB / 205399
> Shuffle Write: 6.8 MB / 54905



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org