You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2019/04/01 16:11:02 UTC

[jira] [Updated] (SPARK-27278) SimplifyExtractValueOps doesn't optimise Literal map access to CASE WHEN ... END

     [ https://issues.apache.org/jira/browse/SPARK-27278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-27278:
----------------------------------
    Issue Type: Improvement  (was: Bug)

> SimplifyExtractValueOps doesn't optimise Literal map access to CASE WHEN ... END
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-27278
>                 URL: https://issues.apache.org/jira/browse/SPARK-27278
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.0
>         Environment: Spark 2.4.0
>            Reporter: Huon Wilson
>            Priority: Major
>              Labels: Optimization
>
> With a map that isn't constant-foldable, spark will optimise an access to a series of {{CASE WHEN ... THEN ... WHEN ... THEN ... END}}, for instance
> {code:none}
> scala> spark.range(1000).select(map(lit(1), lit(1), lit(2), 'id)('id) as "x").explain
> == Physical Plan ==
> *(1) Project [CASE WHEN (cast(id#180L as int) = 1) THEN 1 WHEN (cast(id#180L as int) = 2) THEN id#180L END AS x#182L]
> +- *(1) Range (0, 1000, step=1, splits=12)
> {code}
> This results in an efficient series of ifs and elses, in the code generation:
> {code:java}
> /* 037 */       boolean project_isNull_3 = false;
> /* 038 */       int project_value_3 = -1;
> /* 039 */       if (!false) {
> /* 040 */         project_value_3 = (int) project_expr_0_0;
> /* 041 */       }
> /* 042 */
> /* 043 */       boolean project_value_2 = false;
> /* 044 */       project_value_2 = project_value_3 == 1;
> /* 045 */       if (!false && project_value_2) {
> /* 046 */         project_caseWhenResultState_0 = (byte)(false ? 1 : 0);
> /* 047 */         project_project_value_1_0 = 1L;
> /* 048 */         continue;
> /* 049 */       }
> /* 050 */
> /* 051 */       boolean project_isNull_8 = false;
> /* 052 */       int project_value_8 = -1;
> /* 053 */       if (!false) {
> /* 054 */         project_value_8 = (int) project_expr_0_0;
> /* 055 */       }
> /* 056 */
> /* 057 */       boolean project_value_7 = false;
> /* 058 */       project_value_7 = project_value_8 == 2;
> /* 059 */       if (!false && project_value_7) {
> /* 060 */         project_caseWhenResultState_0 = (byte)(false ? 1 : 0);
> /* 061 */         project_project_value_1_0 = project_expr_0_0;
> /* 062 */         continue;
> /* 063 */       }
> {code}
> If the map can be constant folded, the constant folding happens first, and the {{SimplifyExtractValueOps}} optimisation doesn't trigger, resulting doing a map traversal and more dynamic checks:
> {code:none}
> scala> spark.range(1000).select(map(lit(1), lit(1), lit(2), lit(2))('id) as "x").explain
> == Physical Plan ==
> *(1) Project [keys: [1,2], values: [1,2][cast(id#195L as int)] AS x#197]
> +- *(1) Range (0, 1000, step=1, splits=12)
> {code}
> The {{keys: ..., values: ...}} is from the {{ArrayBasedMapData}} type, which is what is stored in the {{Literal}} form of the {{map(...)}} expression in that select. The code generated is less efficient, since it has to do a manual dynamic traversal of the map's array of keys, with type casts etc.:
> {code:java}
> /* 099 */           int project_index_0 = 0;
> /* 100 */           boolean project_found_0 = false;
> /* 101 */           while (project_index_0 < project_length_0 && !project_found_0) {
> /* 102 */             final int project_key_0 = project_keys_0.getInt(project_index_0);
> /* 103 */             if (project_key_0 == project_value_2) {
> /* 104 */               project_found_0 = true;
> /* 105 */             } else {
> /* 106 */               project_index_0++;
> /* 107 */             }
> /* 108 */           }
> /* 109 */
> /* 110 */           if (!project_found_0) {
> /* 111 */             project_isNull_0 = true;
> /* 112 */           } else {
> /* 113 */             project_value_0 = project_values_0.getInt(project_index_0);
> /* 114 */           }
> {code}
> It looks like the problem is in {{SimplifyExtractValueOps}}, which doesn't handle {{GetMapValue(Literal(...), key)}}, only the {{CreateMap}} form:
> {code:scala}
>       case GetMapValue(CreateMap(elems), key) => CaseKeyWhen(key, elems)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org