You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yonghwan Lee (JIRA)" <ji...@apache.org> on 2018/08/19 13:24:00 UTC
[jira] [Updated] (SPARK-25156) Same query returns different result
[ https://issues.apache.org/jira/browse/SPARK-25156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yonghwan Lee updated SPARK-25156:
---------------------------------
Description:
I performed two joins and two left outer join on five tables.
There are several different results when you run the same query multiple times.
Table A
||Column a||Column b||Column c||Column d||
|Long(nullable: false)|Integer(nullable: false)|String(nullable: true)|String(nullable: false)|
Table B
||Column a||Column b||
|Long(nullable: false)|String(nullable: false)|
Table C
||Column a||Column b||
|Integer(nullable: false)|String(nullable: false)|
Table D
||Column a||Column b||Column c||
|Long(nullable: true)|Long(nullable: false)|Integer(nullable: false)|
Table E
||Column a||Column b||Column c||
|Long(nullable: false)|Integer(nullable: false)|String|
Query(Spark SQL)
{code:java}
select A.c, B.b, C.b, D.c, E.c
inner join B on A.a = B.a
inner join C on A.b = C.a
left outer join D on A.d <=> cast(D.a as string)
left outer join E on D.b = E.a and D.c = E.b{code}
I performed above query 10 times, it returns 7 times correct result(count: 830001460) and 3 times incorrect result(count: 830001299)
+ I execute
{code:java}
sql("set spark.sql.shuffle.partitions=801"){code}
Is this spark sql's bug?
was:
I performed two joins and two left outer join on five tables.
There are several different results when you run the same query multiple times.
Table A
||Column a||Column b||Column c||Column d||
|Long(nullable: false)|Integer(nullable: false)|String(nullable: true)|String(nullable: false)|
Table B
||Column a||Column b||
|Long(nullable: false)|String(nullable: false)|
Table C
||Column a||Column b||
|Integer(nullable: false)|String(nullable: false)|
Table D
||Column a||Column b||Column c||
|Long(nullable: true)|Long(nullable: false)|Integer(nullable: false)|
Table E
||Column a||Column b||Column c||
|Long(nullable: false)|Integer(nullable: false)|String|
Query(Spark SQL)
{code:java}
select A.c, B.b, C.b, D.c, E.c
inner join B on A.a = B.a
inner join C on A.b = C.a
left outer join D on A.d <=> cast(D.a as string)
left outer join E on D.b = E.a and D.c = E.b{code}
I performed above query 10 times, it returns 7 times correct result(count: 830001460) and 3 times incorrect result(count: 830001299)
Is this spark sql's bug?
> Same query returns different result
> -----------------------------------
>
> Key: SPARK-25156
> URL: https://issues.apache.org/jira/browse/SPARK-25156
> Project: Spark
> Issue Type: Question
> Components: Spark Core
> Affects Versions: 2.1.1
> Environment: * Spark Version: 2.1.1
> * Java Version: Java 7
> * Scala Version: 2.11.8
> Reporter: Yonghwan Lee
> Priority: Major
> Labels: Question
>
> I performed two joins and two left outer join on five tables.
> There are several different results when you run the same query multiple times.
> Table A
>
> ||Column a||Column b||Column c||Column d||
> |Long(nullable: false)|Integer(nullable: false)|String(nullable: true)|String(nullable: false)|
> Table B
> ||Column a||Column b||
> |Long(nullable: false)|String(nullable: false)|
> Table C
> ||Column a||Column b||
> |Integer(nullable: false)|String(nullable: false)|
> Table D
> ||Column a||Column b||Column c||
> |Long(nullable: true)|Long(nullable: false)|Integer(nullable: false)|
> Table E
> ||Column a||Column b||Column c||
> |Long(nullable: false)|Integer(nullable: false)|String|
> Query(Spark SQL)
> {code:java}
> select A.c, B.b, C.b, D.c, E.c
> inner join B on A.a = B.a
> inner join C on A.b = C.a
> left outer join D on A.d <=> cast(D.a as string)
> left outer join E on D.b = E.a and D.c = E.b{code}
>
> I performed above query 10 times, it returns 7 times correct result(count: 830001460) and 3 times incorrect result(count: 830001299)
>
> + I execute
> {code:java}
> sql("set spark.sql.shuffle.partitions=801"){code}
> Is this spark sql's bug?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org