You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Eyal Farago (JIRA)" <ji...@apache.org> on 2019/07/08 14:22:00 UTC

[jira] [Created] (SPARK-28304) FileFormatWriter introduces an uncoditional join, even when all attributes are constants

Eyal Farago created SPARK-28304:
-----------------------------------

             Summary: FileFormatWriter introduces an uncoditional join, even when all attributes are constants
                 Key: SPARK-28304
                 URL: https://issues.apache.org/jira/browse/SPARK-28304
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.3.2
            Reporter: Eyal Farago


FileFormatWriter derives a required sort order based on the partition columns, bucketing columns and explicitly required ordering. However in some use cases Some (or even all) of these fields are constant, in these cases the sort can be skipped.

i.e. in my use-case, we add a GUUID column identifying a specific (incremental) load, this can be thought of as a batch id. Since we run one batch at a time, this column is always a constant which means there's no need to sort based on this column, since we don't use bucketing or require an explicit ordering the entire sort can be skipped for our case.

 

I suggest:
 # filter away constant columns from the required ordering calculated by FileFormatWriter 
 # generalizing this to any Sort operator in a spark plan.
 # introduce optimizer rules to remove constants from sort ordering, potentially eliminating the sort operator altogether.
 # modify EnsureRequirements to be aware of constant field when deciding whether to introduce a sort or not. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org