You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Huw (Jira)" <ji...@apache.org> on 2023/10/18 03:39:00 UTC

[jira] [Created] (SPARK-45583) Spark SQL returning incorrect values for full outer join on keys with the same name.

Huw created SPARK-45583:
---------------------------

             Summary: Spark SQL returning incorrect values for full outer join on keys with the same name.
                 Key: SPARK-45583
                 URL: https://issues.apache.org/jira/browse/SPARK-45583
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.5.0
            Reporter: Huw


{{The following query gives the wrong results.}}

 

{{WITH people as (}}
{{  SELECT * FROM (VALUES }}
{{    (1, 'Peter'), }}
{{    (2, 'Homer'), }}
{{    (3, 'Ned'),}}
{{    (3, 'Jenny')}}
{{  ) AS Idiots(id, FirstName)}}
{{{}){}}}{{{}, location as ({}}}
{{  SELECT * FROM (VALUES}}
{{    (1, 'sample0'),}}
{{    (1, 'sample1'),}}
{{    (2, 'sample2')  }}
{{  ) as Locations(id, address)}}
{{{}){}}}{{{}SELECT{}}}
{{  *}}
{{FROM}}
{{  people}}
{{FULL OUTER JOIN}}
{{  location}}
{{ON}}
{{  people.id = location.id}}

{{We find the following table:}}
{{}}
||idinteger||FirstNamestring||idinteger||addressstring||
|2|Homer|2|sample2|
|null|Ned|null|null|
|null|Jenny|null|null|
|1|Peter|1|sample0|
|1|Peter|1|sample1|

{{But clearly the first `id` table is wrong, the nulls should be 3.}}

If we rename the id column in (only) the person table to pid we get the correct results:


||pidinteger||FirstNamestring||idinteger||addressstring||
|2|Homer|2|sample2|
|3|Ned|null|null|
|3|Jenny|null|null|
|1|Peter|1|sample0|
|1|Peter|1|sample1|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org