You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Eric Friedman <er...@gmail.com> on 2015/07/28 16:47:03 UTC

projection optimization?

If I have a Hive table with six columns and create a DataFrame (Spark
1.4.1) using a sqlContext.sql("select * from ...") query, the resulting
physical plan shown by explain reflects the goal of returning all six
columns.

If I then call select("one_column") on that first DataFrame, the resulting
DataFrame still gives a physical plan of fetching all six columns.

Shouldn't the subsequent select() have pruned the projections in the
physical plan?