You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "A S (Jira)" <ji...@apache.org> on 2022/01/01 02:25:00 UTC

[jira] [Created] (SPARK-37798) Pyspark Pandas API: Cross and Conditional Merging

A S created SPARK-37798:
---------------------------

             Summary: Pyspark Pandas API: Cross and Conditional Merging
                 Key: SPARK-37798
                 URL: https://issues.apache.org/jira/browse/SPARK-37798
             Project: Spark
          Issue Type: New Feature
          Components: PySpark
    Affects Versions: 3.2.0
            Reporter: A S


Pandas currently supports a `how="cross"` merge which provides a cartesian product of the left/right tables. This can be achieved by doing a `spark.sql.dataframe.join(..., on=None, how="inner")`.

Additionally, I am currently in the middle of adding conditional merging in the pandas API (see PR here: [https://github.com/pandas-dev/pandas/pull/42964|https://github.com/pandas-dev/pandas/pull/42964).]). This is much easier to achieve in spark, since the functionality is already available, and we can trivially expose it in the pyspark pandas API. Due to the demand  of this functionality (countless SO/pandas issues either asking how to do this, or asking questions that would be solved by this), I think that this would be worth adding even before it makes it into the core pandas API.

Will open a PR, which includes both, shortly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org