You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (Jira)" <ji...@apache.org> on 2020/02/21 00:26:00 UTC
[jira] [Updated] (SPARK-30768) Constraints inferred from inequality attributes

     [ https://issues.apache.org/jira/browse/SPARK-30768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuming Wang updated SPARK-30768:
--------------------------------
    Summary: Constraints inferred from inequality attributes  (was: Constraints should be inferred from inequality attributes)

> Constraints inferred from inequality attributes
> -----------------------------------------------
>
>                 Key: SPARK-30768
>                 URL: https://issues.apache.org/jira/browse/SPARK-30768
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Yuming Wang
>            Priority: Major
>
> How to reproduce:
> {code:sql}
> create table SPARK_30768_1(c1 int, c2 int);
> create table SPARK_30768_2(c1 int, c2 int);
> {code}
> *Spark SQL*:
> {noformat}
> spark-sql> explain select t1.* from SPARK_30768_1 t1 join SPARK_30768_2 t2 on (t1.c1 > t2.c1) where t1.c1 = 3;
> == Physical Plan ==
> *(3) Project [c1#5, c2#6]
> +- BroadcastNestedLoopJoin BuildRight, Inner, (c1#5 > c1#7)
>    :- *(1) Project [c1#5, c2#6]
>    :  +- *(1) Filter (isnotnull(c1#5) AND (c1#5 = 3))
>    :     +- *(1) ColumnarToRow
>    :        +- FileScan parquet default.spark_30768_1[c1#5,c2#6] Batched: true, DataFilters: [isnotnull(c1#5), (c1#5 = 3)], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous..., PartitionFilters: [], PushedFilters: [IsNotNull(c1), EqualTo(c1,3)], ReadSchema: struct<c1:int,c2:int>
>    +- BroadcastExchange IdentityBroadcastMode, [id=#60]
>       +- *(2) Project [c1#7]
>          +- *(2) Filter isnotnull(c1#7)
>             +- *(2) ColumnarToRow
>                +- FileScan parquet default.spark_30768_2[c1#7] Batched: true, DataFilters: [isnotnull(c1#7)], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous..., PartitionFilters: [], PushedFilters: [IsNotNull(c1)], ReadSchema: struct<c1:int>
> {noformat}
> *Hive* support this feature:
> {noformat}
> hive> explain select t1.* from SPARK_30768_1 t1 join SPARK_30768_2 t2 on (t1.c1 > t2.c1) where t1.c1 = 3;
> Warning: Map Join MAPJOIN[13][bigTable=?] in task 'Stage-3:MAPRED' is a cross product
> OK
> STAGE DEPENDENCIES:
>   Stage-4 is a root stage
>   Stage-3 depends on stages: Stage-4
>   Stage-0 depends on stages: Stage-3
> STAGE PLANS:
>   Stage: Stage-4
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         $hdt$_0:t1
>           Fetch Operator
>             limit: -1
>       Alias -> Map Local Operator Tree:
>         $hdt$_0:t1
>           TableScan
>             alias: t1
>             filterExpr: (c1 = 3) (type: boolean)
>             Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>             Filter Operator
>               predicate: (c1 = 3) (type: boolean)
>               Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>               Select Operator
>                 expressions: c2 (type: int)
>                 outputColumnNames: _col1
>                 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>                 HashTable Sink Operator
>                   keys:
>                     0
>                     1
>   Stage: Stage-3
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: t2
>             filterExpr: (c1 < 3) (type: boolean)
>             Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>             Filter Operator
>               predicate: (c1 < 3) (type: boolean)
>               Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>               Select Operator
>                 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>                 Map Join Operator
>                   condition map:
>                        Inner Join 0 to 1
>                   keys:
>                     0
>                     1
>                   outputColumnNames: _col1
>                   Statistics: Num rows: 1 Data size: 1 Basic stats: PARTIAL Column stats: NONE
>                   Select Operator
>                     expressions: 3 (type: int), _col1 (type: int)
>                     outputColumnNames: _col0, _col1
>                     Statistics: Num rows: 1 Data size: 1 Basic stats: PARTIAL Column stats: NONE
>                     File Output Operator
>                       compressed: false
>                       Statistics: Num rows: 1 Data size: 1 Basic stats: PARTIAL Column stats: NONE
>                       table:
>                           input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>                           output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                           serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>       Execution mode: vectorized
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> Time taken: 5.491 seconds, Fetched: 71 row(s)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org