You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Jungtaek Lim (Jira)" <ji...@apache.org> on 2023/07/21 04:26:00 UTC

[jira] (SPARK-44464) Fix applyInPandasWithStatePythonRunner to output rows that have Null as first column value

    [ https://issues.apache.org/jira/browse/SPARK-44464 ]


    Jungtaek Lim deleted comment on SPARK-44464:
    --------------------------------------

was (Author: JIRAUSER300009):
[~kabhwan] should we backport it all the way to 11.3? Or it's OK to only fix newer versions?

> Fix applyInPandasWithStatePythonRunner to output rows that have Null as first column value
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-44464
>                 URL: https://issues.apache.org/jira/browse/SPARK-44464
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 3.3.3
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Major
>             Fix For: 3.5.0
>
>
> The current implementation of {{ApplyInPandasWithStatePythonRunner}} cannot deal with outputs where the first column of the row is {{{}null{}}}, as it cannot distinguish the case where the column is null, or the field is filled as the number of data records are smaller than state records. It causes incorrect results for the former case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org