You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xinrong Meng (Jira)" <ji...@apache.org> on 2021/05/07 18:48:00 UTC

[jira] [Updated] (SPARK-35337) pandas APIs on Spark: Separate basic operations into data type based structures

     [ https://issues.apache.org/jira/browse/SPARK-35337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xinrong Meng updated SPARK-35337:
---------------------------------
    Description: 
Currently, the same basic operation of all data types is defined in one function, so it’s difficult to extend the behavior change based on the data types. For example, the binary operation Series + Series behaves differently based on the data type, e.g., just adding for numerical operands, concatenating for string operands, etc. The behavior difference is done by if-else in the function, so it’s messy and difficult to maintain or reuse the logic.

We should provide an infrastructure to manage the differences in these operations.

Please refer to pandas APIs on Spark: Separate basic operations into data type based structures for details.

> pandas APIs on Spark: Separate basic operations into data type based structures
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-35337
>                 URL: https://issues.apache.org/jira/browse/SPARK-35337
>             Project: Spark
>          Issue Type: Umbrella
>          Components: PySpark
>    Affects Versions: 3.2.0
>            Reporter: Xinrong Meng
>            Priority: Major
>
> Currently, the same basic operation of all data types is defined in one function, so it’s difficult to extend the behavior change based on the data types. For example, the binary operation Series + Series behaves differently based on the data type, e.g., just adding for numerical operands, concatenating for string operands, etc. The behavior difference is done by if-else in the function, so it’s messy and difficult to maintain or reuse the logic.
> We should provide an infrastructure to manage the differences in these operations.
> Please refer to pandas APIs on Spark: Separate basic operations into data type based structures for details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org