You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Clément de Groc (Jira)" <ji...@apache.org> on 2023/04/01 05:28:00 UTC

[jira] [Commented] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values

    [ https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17707502#comment-17707502 ] 

Clément de Groc commented on SPARK-37829:
-----------------------------------------

I'm not planning to resume. I don't know that part of the codebase well enough to submit a better fix other than the one I already submitted in my PR.

> An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-37829
>                 URL: https://issues.apache.org/jira/browse/SPARK-37829
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>            Reporter: Clément de Groc
>            Priority: Major
>
> Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with {{null}} values in Spark 3+.
> The issue can be reproduced with [the following test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5] that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0.
> The problem only arises when working with DataFrames: Datasets of case classes work as expected as demonstrated by [this other test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223].
> I couldn't find an explanation for this change in the Migration guide so I'm assuming this is a bug.
> A {{git bisect}} pointed me to [that commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59].
> Reverting the commit solves the problem.
> A similar solution,  but without reverting, is shown [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a].
> Happy to help if you think of another approach / can provide some guidance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org