You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "EnricoMi (via GitHub)" <gi...@apache.org> on 2023/02/07 08:25:19 UTC

[GitHub] [spark] EnricoMi commented on pull request #39902: [SPARK-42349][PYTHON]Support pandas cogroup with multiple df

EnricoMi commented on PR #39902:
URL: https://github.com/apache/spark/pull/39902#issuecomment-1420377190

   Excellent work. I would strongly recommend two things:
   - lets make existing CoGroup code handle many dataframes, this way lots of code does not get duplicated
   - lets always expect the first argument of the UDF to be the key, things simplify that way and there is not much overhead of always providing the key
   
   But let's first hear whether Spark committers are happy to approve either.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org