You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (Jira)" <ji...@apache.org> on 2020/02/21 00:26:00 UTC
[jira] [Updated] (SPARK-30768) Constraints inferred from inequality
attributes
[ https://issues.apache.org/jira/browse/SPARK-30768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuming Wang updated SPARK-30768:
--------------------------------
Summary: Constraints inferred from inequality attributes (was: Constraints should be inferred from inequality attributes)
> Constraints inferred from inequality attributes
> -----------------------------------------------
>
> Key: SPARK-30768
> URL: https://issues.apache.org/jira/browse/SPARK-30768
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Yuming Wang
> Priority: Major
>
> How to reproduce:
> {code:sql}
> create table SPARK_30768_1(c1 int, c2 int);
> create table SPARK_30768_2(c1 int, c2 int);
> {code}
> *Spark SQL*:
> {noformat}
> spark-sql> explain select t1.* from SPARK_30768_1 t1 join SPARK_30768_2 t2 on (t1.c1 > t2.c1) where t1.c1 = 3;
> == Physical Plan ==
> *(3) Project [c1#5, c2#6]
> +- BroadcastNestedLoopJoin BuildRight, Inner, (c1#5 > c1#7)
> :- *(1) Project [c1#5, c2#6]
> : +- *(1) Filter (isnotnull(c1#5) AND (c1#5 = 3))
> : +- *(1) ColumnarToRow
> : +- FileScan parquet default.spark_30768_1[c1#5,c2#6] Batched: true, DataFilters: [isnotnull(c1#5), (c1#5 = 3)], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous..., PartitionFilters: [], PushedFilters: [IsNotNull(c1), EqualTo(c1,3)], ReadSchema: struct<c1:int,c2:int>
> +- BroadcastExchange IdentityBroadcastMode, [id=#60]
> +- *(2) Project [c1#7]
> +- *(2) Filter isnotnull(c1#7)
> +- *(2) ColumnarToRow
> +- FileScan parquet default.spark_30768_2[c1#7] Batched: true, DataFilters: [isnotnull(c1#7)], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous..., PartitionFilters: [], PushedFilters: [IsNotNull(c1)], ReadSchema: struct<c1:int>
> {noformat}
> *Hive* support this feature:
> {noformat}
> hive> explain select t1.* from SPARK_30768_1 t1 join SPARK_30768_2 t2 on (t1.c1 > t2.c1) where t1.c1 = 3;
> Warning: Map Join MAPJOIN[13][bigTable=?] in task 'Stage-3:MAPRED' is a cross product
> OK
> STAGE DEPENDENCIES:
> Stage-4 is a root stage
> Stage-3 depends on stages: Stage-4
> Stage-0 depends on stages: Stage-3
> STAGE PLANS:
> Stage: Stage-4
> Map Reduce Local Work
> Alias -> Map Local Tables:
> $hdt$_0:t1
> Fetch Operator
> limit: -1
> Alias -> Map Local Operator Tree:
> $hdt$_0:t1
> TableScan
> alias: t1
> filterExpr: (c1 = 3) (type: boolean)
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> Filter Operator
> predicate: (c1 = 3) (type: boolean)
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> Select Operator
> expressions: c2 (type: int)
> outputColumnNames: _col1
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> HashTable Sink Operator
> keys:
> 0
> 1
> Stage: Stage-3
> Map Reduce
> Map Operator Tree:
> TableScan
> alias: t2
> filterExpr: (c1 < 3) (type: boolean)
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> Filter Operator
> predicate: (c1 < 3) (type: boolean)
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> Select Operator
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0
> 1
> outputColumnNames: _col1
> Statistics: Num rows: 1 Data size: 1 Basic stats: PARTIAL Column stats: NONE
> Select Operator
> expressions: 3 (type: int), _col1 (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 1 Data size: 1 Basic stats: PARTIAL Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 1 Basic stats: PARTIAL Column stats: NONE
> table:
> input format: org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Execution mode: vectorized
> Local Work:
> Map Reduce Local Work
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> Time taken: 5.491 seconds, Fetched: 71 row(s)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org