You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bruce Robbins (Jira)" <ji...@apache.org> on 2023/10/18 16:31:00 UTC
[jira] [Commented] (SPARK-45583) Spark SQL returning incorrect values for full outer join on keys with the same name.
[ https://issues.apache.org/jira/browse/SPARK-45583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776783#comment-17776783 ]
Bruce Robbins commented on SPARK-45583:
---------------------------------------
Strangely, I cannot reproduce. Is some setting required?
{noformat}
sql("select version()").show(false)
+----------------------------------------------+
|version() |
+----------------------------------------------+
|3.5.0 ce5ddad990373636e94071e7cef2f31021add07b|
+----------------------------------------------+
scala> sql("""WITH people as (
SELECT * FROM (VALUES
(1, 'Peter'),
(2, 'Homer'),
(3, 'Ned'),
(3, 'Jenny')
) AS Idiots(id, FirstName)
), location as (
SELECT * FROM (VALUES
(1, 'sample0'),
(1, 'sample1'),
(2, 'sample2')
) as Locations(id, address)
)SELECT
*
FROM
people
FULL OUTER JOIN
location
ON
people.id = location.id""").show(false)
| | | | | | | | | | | | | | | | | | | |
+---+---------+----+-------+
|id |FirstName|id |address|
+---+---------+----+-------+
|1 |Peter |1 |sample0|
|1 |Peter |1 |sample1|
|2 |Homer |2 |sample2|
|3 |Ned |NULL|NULL |
|3 |Jenny |NULL|NULL |
+---+---------+----+-------+
scala>
{noformat}
> Spark SQL returning incorrect values for full outer join on keys with the same name.
> ------------------------------------------------------------------------------------
>
> Key: SPARK-45583
> URL: https://issues.apache.org/jira/browse/SPARK-45583
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.5.0
> Reporter: Huw
> Priority: Major
>
> {{The following query gives the wrong results.}}
>
> {{WITH people as (}}
> {{ SELECT * FROM (VALUES }}
> {{ (1, 'Peter'), }}
> {{ (2, 'Homer'), }}
> {{ (3, 'Ned'),}}
> {{ (3, 'Jenny')}}
> {{ ) AS Idiots(id, FirstName)}}
> {{{}){}}}{{{}, location as ({}}}
> {{ SELECT * FROM (VALUES}}
> {{ (1, 'sample0'),}}
> {{ (1, 'sample1'),}}
> {{ (2, 'sample2') }}
> {{ ) as Locations(id, address)}}
> {{{}){}}}{{{}SELECT{}}}
> {{ *}}
> {{FROM}}
> {{ people}}
> {{FULL OUTER JOIN}}
> {{ location}}
> {{ON}}
> {{ people.id = location.id}}
> {{We find the following table:}}
> ||id: integer||FirstName: string||id: integer||address: string||
> |2|Homer|2|sample2|
> |null|Ned|null|null|
> |null|Jenny|null|null|
> |1|Peter|1|sample0|
> |1|Peter|1|sample1|
> {{But clearly the first `id` column is wrong, the nulls should be 3.}}
> If we rename the id column in (only) the person table to pid we get the correct results:
> ||pid: integer||FirstName: string||id: integer||address: string||
> |2|Homer|2|sample2|
> |3|Ned|null|null|
> |3|Jenny|null|null|
> |1|Peter|1|sample0|
> |1|Peter|1|sample1|
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org