You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by HeartSaVioR <gi...@git.apache.org> on 2018/08/15 07:26:50 UTC

[GitHub] spark issue #21718: [SPARK-24744][STRUCTRURED STREAMING] Set the SparkSessio...

Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21718
  
    @bjkonglu @bethunebtj @wguangliang 
    
    Update: I thought about splitting execution tasks and data partitions (`spark.sql.shuffle.partitions`), and turned out it can be achieved by calling `coalesce`. With `coalesce` you can reduce execution tasks whereas the number of data partitions is kept same. Please note that we still can't change `spark.sql.shuffle.partitions`, since repartitioning state will not be trivial according to the size of the state.
    
    One thing to note is that execution tasks will be reduced even for downstream operators (unless there's a new stage), so you need to call `repartition` to adjust execution tasks for downstream.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org