You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2022/08/13 05:40:00 UTC

[jira] [Commented] (SPARK-40063) pyspark.pandas .apply() changing rows ordering

    [ https://issues.apache.org/jira/browse/SPARK-40063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579213#comment-17579213 ] 

Hyukjin Kwon commented on SPARK-40063:
--------------------------------------

{quote}
 it ends up mixing the column's rows ordering.
{quote}

Can you show the expected/actual output? What's column's rows ordering?

> pyspark.pandas .apply() changing rows ordering
> ----------------------------------------------
>
>                 Key: SPARK-40063
>                 URL: https://issues.apache.org/jira/browse/SPARK-40063
>             Project: Spark
>          Issue Type: Bug
>          Components: Pandas API on Spark
>    Affects Versions: 3.3.0
>         Environment: Databricks Runtime 11.1
>            Reporter: Marcelo Rossini Castro
>            Priority: Minor
>              Labels: Pandas, PySpark
>
> When using the apply function to apply a function to a DataFrame column, it ends up mixing the column's rows ordering.
> A command like this:
> {code:java}
> def example_func(df_col):
>   return df_col ** 2 
> df['row_to_apply_function'] = df.apply(lambda row: example_func(row['row_to_apply_function']), axis=1) {code}
> A workaround is to assign the results to a new column instead of the same one, but if the old column is dropped, the same error is produced.
> Setting one column as index also didn't work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org