You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/06 15:53:17 UTC

[GitHub] viirya opened a new pull request #23740: [SPARK-26837][SQL] Pruning nested fields from object serializers

viirya opened a new pull request #23740: [SPARK-26837][SQL] Pruning nested fields from object serializers
URL: https://github.com/apache/spark/pull/23740
 
 
   ## What changes were proposed in this pull request?
   
   In SPARK-26619, we make change to prune unnecessary individual serializers when serializing objects. This is extension to SPARK-26619. We can further prune nested fields from object serializers if they are not used.
   
   For example, in following query, we only use one field in a struct column:
   
   ```scala
   val data = Seq((("a", 1), 1), (("b", 2), 2), (("c", 3), 3))
   val df = data.toDS().map(t => (t._1, t._2 + 1)).select("_1._1")
   ```
   
   So, instead of having a serializer to create a two fields struct, we can prune unnecessary field from it. This is what this PR proposes to do.
   
   In order to make this change conservative and safer, a SQL config is added to control it. It is disabled by default.
   
   TODO: Support to prune nested fields inside MapType's key and value.
   
   ## How was this patch tested?
   
   Added tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org