You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "EnricoMi (via GitHub)" <gi...@apache.org> on 2023/02/07 08:25:19 UTC
[GitHub] [spark] EnricoMi commented on pull request #39902: [SPARK-42349][PYTHON]Support pandas cogroup with multiple df
EnricoMi commented on PR #39902:
URL: https://github.com/apache/spark/pull/39902#issuecomment-1420377190
Excellent work. I would strongly recommend two things:
- lets make existing CoGroup code handle many dataframes, this way lots of code does not get duplicated
- lets always expect the first argument of the UDF to be the key, things simplify that way and there is not much overhead of always providing the key
But let's first hear whether Spark committers are happy to approve either.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org