You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yonghwan Lee (JIRA)" <ji...@apache.org> on 2018/08/19 13:24:00 UTC

[jira] [Updated] (SPARK-25156) Same query returns different result

     [ https://issues.apache.org/jira/browse/SPARK-25156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonghwan Lee updated SPARK-25156:
---------------------------------
    Description: 
I performed two joins and two left outer join on five tables.

There are several different results when you run the same query multiple times.

Table A
  
||Column a||Column b||Column c||Column d||
|Long(nullable: false)|Integer(nullable: false)|String(nullable: true)|String(nullable: false)|

Table B
||Column a||Column b||
|Long(nullable: false)|String(nullable: false)|

Table C
||Column a||Column b||
|Integer(nullable: false)|String(nullable: false)|

Table D
||Column a||Column b||Column c||
|Long(nullable: true)|Long(nullable: false)|Integer(nullable: false)|

Table E
||Column a||Column b||Column c||
|Long(nullable: false)|Integer(nullable: false)|String|

Query(Spark SQL)
{code:java}
select A.c, B.b, C.b, D.c, E.c
inner join B on A.a = B.a
inner join C on A.b = C.a
left outer join D on A.d <=> cast(D.a as string)
left outer join E on D.b = E.a and D.c = E.b{code}
 

I performed above query 10 times, it returns 7 times correct result(count: 830001460) and 3 times incorrect result(count: 830001299)

 

+ I execute 
{code:java}
sql("set spark.sql.shuffle.partitions=801"){code}
Is this spark sql's bug?

  was:
I performed two joins and two left outer join on five tables.

There are several different results when you run the same query multiple times.

Table A
 
||Column a||Column b||Column c||Column d||
|Long(nullable: false)|Integer(nullable: false)|String(nullable: true)|String(nullable: false)|

Table B
||Column a||Column b||
|Long(nullable: false)|String(nullable: false)|

Table C
||Column a||Column b||
|Integer(nullable: false)|String(nullable: false)|

Table D
||Column a||Column b||Column c||
|Long(nullable: true)|Long(nullable: false)|Integer(nullable: false)|

Table E
||Column a||Column b||Column c||
|Long(nullable: false)|Integer(nullable: false)|String|

Query(Spark SQL)
{code:java}
select A.c, B.b, C.b, D.c, E.c
inner join B on A.a = B.a
inner join C on A.b = C.a
left outer join D on A.d <=> cast(D.a as string)
left outer join E on D.b = E.a and D.c = E.b{code}
 

I performed above query 10 times, it returns 7 times correct result(count: 830001460) and 3 times incorrect result(count: 830001299)

 

Is this spark sql's bug?


> Same query returns different result
> -----------------------------------
>
>                 Key: SPARK-25156
>                 URL: https://issues.apache.org/jira/browse/SPARK-25156
>             Project: Spark
>          Issue Type: Question
>          Components: Spark Core
>    Affects Versions: 2.1.1
>         Environment: * Spark Version: 2.1.1
>  * Java Version: Java 7
>  * Scala Version: 2.11.8
>            Reporter: Yonghwan Lee
>            Priority: Major
>              Labels: Question
>
> I performed two joins and two left outer join on five tables.
> There are several different results when you run the same query multiple times.
> Table A
>   
> ||Column a||Column b||Column c||Column d||
> |Long(nullable: false)|Integer(nullable: false)|String(nullable: true)|String(nullable: false)|
> Table B
> ||Column a||Column b||
> |Long(nullable: false)|String(nullable: false)|
> Table C
> ||Column a||Column b||
> |Integer(nullable: false)|String(nullable: false)|
> Table D
> ||Column a||Column b||Column c||
> |Long(nullable: true)|Long(nullable: false)|Integer(nullable: false)|
> Table E
> ||Column a||Column b||Column c||
> |Long(nullable: false)|Integer(nullable: false)|String|
> Query(Spark SQL)
> {code:java}
> select A.c, B.b, C.b, D.c, E.c
> inner join B on A.a = B.a
> inner join C on A.b = C.a
> left outer join D on A.d <=> cast(D.a as string)
> left outer join E on D.b = E.a and D.c = E.b{code}
>  
> I performed above query 10 times, it returns 7 times correct result(count: 830001460) and 3 times incorrect result(count: 830001299)
>  
> + I execute 
> {code:java}
> sql("set spark.sql.shuffle.partitions=801"){code}
> Is this spark sql's bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org