You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "zhengruifeng (via GitHub)" <gi...@apache.org> on 2023/03/23 02:22:42 UTC

[GitHub] [spark] zhengruifeng commented on pull request #40520: [SPARK-42896][SQL][PYSPARK] Make `mapInPandas` / `mapInArrow` support barrier mode execution

zhengruifeng commented on PR #40520:
URL: https://github.com/apache/spark/pull/40520#issuecomment-1480490580

   > Barrier mode is only used in specific ML case, i.e. in model training routine, we will only use it in one pattern:
   > 
   > dataset.mapInPandas(..., is_barrier=True).collect()
   
   > To simply the implementation, we can implement a barrierMapInPandasAndCollect instead, and define a execution plan stage like BarrierMapInPandasAndCollectExec
   
   If it is the only use case, i think it will be safe to add dedicated logical plan and physical plan for it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org