You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2017/03/01 14:31:38 UTC

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/17122

    [SPARK-19786][SQL] Facilitate loop optimizations in a JIT compiler regarding range()

    ## What changes were proposed in this pull request?
    
    This PR improves performance of operations with `range()` by changing Java code generated by Catalyst. This PR is inspired by the [blog article](https://databricks.com/blog/2017/02/16/processing-trillion-rows-per-second-single-machine-can-nested-loop-joins-fast.html).
    
    This PR changes generated code in the following two points.
    1. Replace a while-loop with long instance variables a for-loop with int local varibles
    2. Suppress generation of `shouldStop()` method if this method is unnecessary (e.g. `append()` is not generated).
    
    These points facilitates compiler optimizations in a JIT compiler by feeding the simplified Java code into the JIT compiler. The performance is improved by 7.6x.
    
    Benchmark program:
    ```java
    val N = 1 << 29
    val iters = 2
    val benchmark = new Benchmark("range.count", N * iters)
    benchmark.addCase(s"with this PR") { i =>
      var n = 0
      var len = 0
      while (n < iters) {
        len += sparkSession.range(N).selectExpr("count(id)").collect.length
        n += 1
      }
    }
    benchmark.run
    ```
    
    Performance result without this PR
    ```
    OpenJDK 64-Bit Server VM 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14 on Linux 4.4.0-47-generic
    Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
    range.count:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    w/o this PR                                   1349 / 1356        796.2           1.3       1.0X
    ```
    
    Performance result with this PR
    ```
    OpenJDK 64-Bit Server VM 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14 on Linux 4.4.0-47-generic
    Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
    range.count:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    with this PR                                   177 /  271       6065.3           0.2       1.0X
    ```
    
    Here is a comparison between generated code w/o and with this PR. Only the method ```agg_doAggregateWithoutKey``` is changed.
    
    Generated code without this PR
    ```java
    
    /* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
    /* 006 */   private Object[] references;
    /* 007 */   private scala.collection.Iterator[] inputs;
    /* 008 */   private boolean agg_initAgg;
    /* 009 */   private boolean agg_bufIsNull;
    /* 010 */   private long agg_bufValue;
    /* 011 */   private org.apache.spark.sql.execution.metric.SQLMetric range_numOutputRows;
    /* 012 */   private org.apache.spark.sql.execution.metric.SQLMetric range_numGeneratedRows;
    /* 013 */   private boolean range_initRange;
    /* 014 */   private long range_number;
    /* 015 */   private TaskContext range_taskContext;
    /* 016 */   private InputMetrics range_inputMetrics;
    /* 017 */   private long range_batchEnd;
    /* 018 */   private long range_numElementsTodo;
    /* 019 */   private scala.collection.Iterator range_input;
    /* 020 */   private UnsafeRow range_result;
    /* 021 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder range_holder;
    /* 022 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter range_rowWriter;
    /* 023 */   private org.apache.spark.sql.execution.metric.SQLMetric agg_numOutputRows;
    /* 024 */   private org.apache.spark.sql.execution.metric.SQLMetric agg_aggTime;
    /* 025 */   private UnsafeRow agg_result;
    /* 026 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder agg_holder;
    /* 027 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter agg_rowWriter;
    /* 028 */
    /* 029 */   public GeneratedIterator(Object[] references) {
    /* 030 */     this.references = references;
    /* 031 */   }
    /* 032 */
    /* 033 */   public void init(int index, scala.collection.Iterator[] inputs) {
    /* 034 */     partitionIndex = index;
    /* 035 */     this.inputs = inputs;
    /* 036 */     agg_initAgg = false;
    /* 037 */
    /* 038 */     this.range_numOutputRows = (org.apache.spark.sql.execution.metric.SQLMetric) references[0];
    /* 039 */     this.range_numGeneratedRows = (org.apache.spark.sql.execution.metric.SQLMetric) references[1];
    /* 040 */     range_initRange = false;
    /* 041 */     range_number = 0L;
    /* 042 */     range_taskContext = TaskContext.get();
    /* 043 */     range_inputMetrics = range_taskContext.taskMetrics().inputMetrics();
    /* 044 */     range_batchEnd = 0;
    /* 045 */     range_numElementsTodo = 0L;
    /* 046 */     range_input = inputs[0];
    /* 047 */     range_result = new UnsafeRow(1);
    /* 048 */     this.range_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(range_result, 0);
    /* 049 */     this.range_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(range_holder, 1);
    /* 050 */     this.agg_numOutputRows = (org.apache.spark.sql.execution.metric.SQLMetric) references[2];
    /* 051 */     this.agg_aggTime = (org.apache.spark.sql.execution.metric.SQLMetric) references[3];
    /* 052 */     agg_result = new UnsafeRow(1);
    /* 053 */     this.agg_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_result, 0);
    /* 054 */     this.agg_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_holder, 1);
    /* 055 */
    /* 056 */   }
    /* 057 */
    /* 058 */   private void agg_doAggregateWithoutKey() throws java.io.IOException {
    /* 059 */     // initialize aggregation buffer
    /* 060 */     agg_bufIsNull = false;
    /* 061 */     agg_bufValue = 0L;
    /* 062 */
    /* 063 */     // initialize Range
    /* 064 */     if (!range_initRange) {
    /* 065 */       range_initRange = true;
    /* 066 */       initRange(partitionIndex);
    /* 067 */     }
    /* 068 */
    /* 069 */     while (true) {
    /* 070 */       while (range_number != range_batchEnd) {
    /* 071 */         long range_value = range_number;
    /* 072 */         range_number += 1L;
    /* 073 */
    /* 074 */         // do aggregate
    /* 075 */         // common sub-expressions
    /* 076 */
    /* 077 */         // evaluate aggregate function
    /* 078 */         boolean agg_isNull1 = false;
    /* 079 */
    /* 080 */         long agg_value1 = -1L;
    /* 081 */         agg_value1 = agg_bufValue + 1L;
    /* 082 */         // update aggregation buffer
    /* 083 */         agg_bufIsNull = false;
    /* 084 */         agg_bufValue = agg_value1;
    /* 085 */
    /* 086 */         if (shouldStop()) return;
    /* 087 */       }
    /* 088 */
    /* 089 */       if (range_taskContext.isInterrupted()) {
    /* 090 */         throw new TaskKilledException();
    /* 091 */       }
    /* 092 */
    /* 093 */       long range_nextBatchTodo;
    /* 094 */       if (range_numElementsTodo > 1000L) {
    /* 095 */         range_nextBatchTodo = 1000L;
    /* 096 */         range_numElementsTodo -= 1000L;
    /* 097 */       } else {
    /* 098 */         range_nextBatchTodo = range_numElementsTodo;
    /* 099 */         range_numElementsTodo = 0;
    /* 100 */         if (range_nextBatchTodo == 0) break;
    /* 101 */       }
    /* 102 */       range_numOutputRows.add(range_nextBatchTodo);
    /* 103 */       range_inputMetrics.incRecordsRead(range_nextBatchTodo);
    /* 104 */
    /* 105 */       range_batchEnd += range_nextBatchTodo * 1L;
    /* 106 */     }
    /* 107 */
    /* 108 */   }
    /* 109 */
    /* 110 */   private void initRange(int idx) {
    /* 111 */     java.math.BigInteger index = java.math.BigInteger.valueOf(idx);
    /* 112 */     java.math.BigInteger numSlice = java.math.BigInteger.valueOf(2L);
    /* 113 */     java.math.BigInteger numElement = java.math.BigInteger.valueOf(10000L);
    /* 114 */     java.math.BigInteger step = java.math.BigInteger.valueOf(1L);
    /* 115 */     java.math.BigInteger start = java.math.BigInteger.valueOf(0L);
    /* 117 */
    /* 118 */     java.math.BigInteger st = index.multiply(numElement).divide(numSlice).multiply(step).add(start);
    /* 119 */     if (st.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) {
    /* 120 */       range_number = Long.MAX_VALUE;
    /* 121 */     } else if (st.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) {
    /* 122 */       range_number = Long.MIN_VALUE;
    /* 123 */     } else {
    /* 124 */       range_number = st.longValue();
    /* 125 */     }
    /* 126 */     range_batchEnd = range_number;
    /* 127 */
    /* 128 */     java.math.BigInteger end = index.add(java.math.BigInteger.ONE).multiply(numElement).divide(numSlice)
    /* 129 */     .multiply(step).add(start);
    /* 130 */     if (end.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) {
    /* 131 */       partitionEnd = Long.MAX_VALUE;
    /* 132 */     } else if (end.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) {
    /* 133 */       partitionEnd = Long.MIN_VALUE;
    /* 134 */     } else {
    /* 135 */       partitionEnd = end.longValue();
    /* 136 */     }
    /* 137 */
    /* 138 */     java.math.BigInteger startToEnd = java.math.BigInteger.valueOf(partitionEnd).subtract(
    /* 139 */       java.math.BigInteger.valueOf(range_number));
    /* 140 */     range_numElementsTodo  = startToEnd.divide(step).longValue();
    /* 141 */     if (range_numElementsTodo < 0) {
    /* 142 */       range_numElementsTodo = 0;
    /* 143 */     } else if (startToEnd.remainder(step).compareTo(java.math.BigInteger.valueOf(0L)) != 0) {
    /* 144 */       range_numElementsTodo++;
    /* 145 */     }
    /* 146 */   }
    /* 147 */
    /* 148 */   protected void processNext() throws java.io.IOException {
    /* 149 */     while (!agg_initAgg) {
    /* 150 */       agg_initAgg = true;
    /* 151 */       long agg_beforeAgg = System.nanoTime();
    /* 152 */       agg_doAggregateWithoutKey();
    /* 153 */       agg_aggTime.add((System.nanoTime() - agg_beforeAgg) / 1000000);
    /* 154 */
    /* 155 */       // output the result
    /* 156 */
    /* 157 */       agg_numOutputRows.add(1);
    /* 158 */       agg_rowWriter.zeroOutNullBytes();
    /* 159 */
    /* 160 */       if (agg_bufIsNull) {
    /* 161 */         agg_rowWriter.setNullAt(0);
    /* 162 */       } else {
    /* 163 */         agg_rowWriter.write(0, agg_bufValue);
    /* 164 */       }
    /* 165 */       append(agg_result);
    /* 166 */     }
    /* 167 */   }
    /* 168 */ }
    ```
    
    Generated code with this PR
    ```java
    /* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
    /* 006 */   private Object[] references;
    /* 007 */   private scala.collection.Iterator[] inputs;
    /* 008 */   private boolean agg_initAgg;
    /* 009 */   private boolean agg_bufIsNull;
    /* 010 */   private long agg_bufValue;
    /* 011 */   private org.apache.spark.sql.execution.metric.SQLMetric range_numOutputRows;
    /* 012 */   private org.apache.spark.sql.execution.metric.SQLMetric range_numGeneratedRows;
    /* 013 */   private boolean range_initRange;
    /* 014 */   private long range_number;
    /* 015 */   private TaskContext range_taskContext;
    /* 016 */   private InputMetrics range_inputMetrics;
    /* 017 */   private long range_batchEnd;
    /* 018 */   private long range_numElementsTodo;
    /* 019 */   private scala.collection.Iterator range_input;
    /* 020 */   private UnsafeRow range_result;
    /* 021 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder range_holder;
    /* 022 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter range_rowWriter;
    /* 023 */   private org.apache.spark.sql.execution.metric.SQLMetric agg_numOutputRows;
    /* 024 */   private org.apache.spark.sql.execution.metric.SQLMetric agg_aggTime;
    /* 025 */   private UnsafeRow agg_result;
    /* 026 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder agg_holder;
    /* 027 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter agg_rowWriter;
    /* 028 */
    /* 029 */   public GeneratedIterator(Object[] references) {
    /* 030 */     this.references = references;
    /* 031 */   }
    /* 032 */
    /* 033 */   public void init(int index, scala.collection.Iterator[] inputs) {
    /* 034 */     partitionIndex = index;
    /* 035 */     this.inputs = inputs;
    /* 036 */     agg_initAgg = false;
    /* 037 */
    /* 038 */     this.range_numOutputRows = (org.apache.spark.sql.execution.metric.SQLMetric) references[0];
    /* 039 */     this.range_numGeneratedRows = (org.apache.spark.sql.execution.metric.SQLMetric) references[1];
    /* 040 */     range_initRange = false;
    /* 041 */     range_number = 0L;
    /* 042 */     range_taskContext = TaskContext.get();
    /* 043 */     range_inputMetrics = range_taskContext.taskMetrics().inputMetrics();
    /* 044 */     range_batchEnd = 0;
    /* 045 */     range_numElementsTodo = 0L;
    /* 046 */     range_input = inputs[0];
    /* 047 */     range_result = new UnsafeRow(1);
    /* 048 */     this.range_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(range_result, 0);
    /* 049 */     this.range_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(range_holder, 1);
    /* 050 */     this.agg_numOutputRows = (org.apache.spark.sql.execution.metric.SQLMetric) references[2];
    /* 051 */     this.agg_aggTime = (org.apache.spark.sql.execution.metric.SQLMetric) references[3];
    /* 052 */     agg_result = new UnsafeRow(1);
    /* 053 */     this.agg_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_result, 0);
    /* 054 */     this.agg_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_holder, 1);
    /* 055 */
    /* 056 */   }
    /* 057 */
    /* 058 */   private void agg_doAggregateWithoutKey() throws java.io.IOException {
    /* 059 */     // initialize aggregation buffer
    /* 060 */     agg_bufIsNull = false;
    /* 061 */     agg_bufValue = 0L;
    /* 062 */
    /* 063 */     // initialize Range
    /* 064 */     if (!range_initRange) {
    /* 065 */       range_initRange = true;
    /* 066 */       initRange(partitionIndex);
    /* 067 */     }
    /* 068 */
    /* 069 */     while (true) {
    /* 070 */       long range_range = range_batchEnd - range_number;
    /* 071 */       if (range_range != 0L) {
    /* 072 */         int range_localEnd = (int)(range_range / 1L);
    /* 073 */         for (int range_localIdx = 0; range_localIdx < range_localEnd; range_localIdx++) {
    /* 074 */           long range_value = ((long)range_localIdx * 1L) + range_number;
    /* 075 */
    /* 076 */           // do aggregate
    /* 077 */           // common sub-expressions
    /* 078 */
    /* 079 */           // evaluate aggregate function
    /* 080 */           boolean agg_isNull1 = false;
    /* 081 */
    /* 082 */           long agg_value1 = -1L;
    /* 083 */           agg_value1 = agg_bufValue + 1L;
    /* 084 */           // update aggregation buffer
    /* 085 */           agg_bufIsNull = false;
    /* 086 */           agg_bufValue = agg_value1;
    /* 087 */
    /* 088 */           // shouldStop check is eliminated
    /* 089 */         }
    /* 090 */         range_number = range_batchEnd;
    /* 091 */       }
    /* 092 */
    /* 093 */       if (range_taskContext.isInterrupted()) {
    /* 094 */         throw new TaskKilledException();
    /* 095 */       }
    /* 096 */
    /* 097 */       long range_nextBatchTodo;
    /* 098 */       if (range_numElementsTodo > 1000L) {
    /* 099 */         range_nextBatchTodo = 1000L;
    /* 100 */         range_numElementsTodo -= 1000L;
    /* 101 */       } else {
    /* 102 */         range_nextBatchTodo = range_numElementsTodo;
    /* 103 */         range_numElementsTodo = 0;
    /* 104 */         if (range_nextBatchTodo == 0) break;
    /* 105 */       }
    /* 106 */       range_numOutputRows.add(range_nextBatchTodo);
    /* 107 */       range_inputMetrics.incRecordsRead(range_nextBatchTodo);
    /* 108 */
    /* 109 */       range_batchEnd += range_nextBatchTodo * 1L;
    /* 110 */     }
    /* 111 */
    /* 112 */   }
    /* 113 */
    /* 114 */   private void initRange(int idx) {
    /* 115 */     java.math.BigInteger index = java.math.BigInteger.valueOf(idx);
    /* 116 */     java.math.BigInteger numSlice = java.math.BigInteger.valueOf(2L);
    /* 117 */     java.math.BigInteger numElement = java.math.BigInteger.valueOf(10000L);
    /* 118 */     java.math.BigInteger step = java.math.BigInteger.valueOf(1L);
    /* 119 */     java.math.BigInteger start = java.math.BigInteger.valueOf(0L);
    /* 120 */     long partitionEnd;
    /* 121 */
    /* 122 */     java.math.BigInteger st = index.multiply(numElement).divide(numSlice).multiply(step).add(start);
    /* 123 */     if (st.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) {
    /* 124 */       range_number = Long.MAX_VALUE;
    /* 125 */     } else if (st.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) {
    /* 126 */       range_number = Long.MIN_VALUE;
    /* 127 */     } else {
    /* 128 */       range_number = st.longValue();
    /* 129 */     }
    /* 130 */     range_batchEnd = range_number;
    /* 131 */
    /* 132 */     java.math.BigInteger end = index.add(java.math.BigInteger.ONE).multiply(numElement).divide(numSlice)
    /* 133 */     .multiply(step).add(start);
    /* 134 */     if (end.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) {
    /* 135 */       partitionEnd = Long.MAX_VALUE;
    /* 136 */     } else if (end.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) {
    /* 137 */       partitionEnd = Long.MIN_VALUE;
    /* 138 */     } else {
    /* 139 */       partitionEnd = end.longValue();
    /* 140 */     }
    /* 141 */
    /* 142 */     java.math.BigInteger startToEnd = java.math.BigInteger.valueOf(partitionEnd).subtract(
    /* 143 */       java.math.BigInteger.valueOf(range_number));
    /* 144 */     range_numElementsTodo  = startToEnd.divide(step).longValue();
    /* 145 */     if (range_numElementsTodo < 0) {
    /* 146 */       range_numElementsTodo = 0;
    /* 147 */     } else if (startToEnd.remainder(step).compareTo(java.math.BigInteger.valueOf(0L)) != 0) {
    /* 148 */       range_numElementsTodo++;
    /* 149 */     }
    /* 150 */   }
    /* 151 */
    /* 152 */   protected void processNext() throws java.io.IOException {
    /* 153 */     while (!agg_initAgg) {
    /* 154 */       agg_initAgg = true;
    /* 155 */       long agg_beforeAgg = System.nanoTime();
    /* 156 */       agg_doAggregateWithoutKey();
    /* 157 */       agg_aggTime.add((System.nanoTime() - agg_beforeAgg) / 1000000);
    /* 158 */
    /* 159 */       // output the result
    /* 160 */
    /* 161 */       agg_numOutputRows.add(1);
    /* 162 */       agg_rowWriter.zeroOutNullBytes();
    /* 163 */
    /* 164 */       if (agg_bufIsNull) {
    /* 165 */         agg_rowWriter.setNullAt(0);
    /* 166 */       } else {
    /* 167 */         agg_rowWriter.write(0, agg_bufValue);
    /* 168 */       }
    /* 169 */       append(agg_result);
    /* 170 */     }
    /* 171 */   }
    /* 172 */ }
    ```
    
    A part of suppressing `shouldStop()` was originally developed by @inouehrs
    
    ## How was this patch tested?
    
    Add new tests into `DataFrameRangeSuite`

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-19786

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17122.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17122
    
----
commit 47f405c32ffac9b0356050c0d6bbb8c0ea5e0f51
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-03-01T14:01:43Z

    initial commit

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103858474
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
    @@ -69,6 +69,7 @@ trait BaseLimitExec extends UnaryExecNode with CodegenSupport {
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         val stopEarly = ctx.freshName("stopEarly")
         ctx.addMutableState("boolean", stopEarly, s"$stopEarly = false;")
    +    shouldStopRequired = true // loop may break early even without append in loop body
    --- End diff --
    
    Good catch. This implementation depends on slightly old revision that means there is no `stopEarly()` method. Removed this line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r104090563
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,18 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /**
    +   * for optimization to suppress shouldStop() in a loop of WholeStageCodegen
    +   *
    +   * isShouldStopRequired: require to insert shouldStop() into the loop if true
    +   */
    +  def isShouldStopRequired: Boolean = {
    +    return shouldStopRequired && !(this.parent != null && !this.parent.isShouldStopRequired)
    +  }
    +
    +  // set false if doConsume() does not insert append() that requires shouldStop() in the loop
    --- End diff --
    
    Look good, done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    **[Test build #73737 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73737/testReport)** for PR 17122 at commit [`6697928`](https://github.com/apache/spark/commit/6697928e4ff8cf93c4b63c9b6e4b18bec4a2f87a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r104285537
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,21 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /**
    +   * For optimization to suppress shouldStop() in a loop of WholeStageCodegen.
    +   * Returning true means we need to insert shouldStop() into the loop producing rows, if any.
    +   */
    +  def isShouldStopRequired: Boolean = {
    +    return shouldStopRequired && !(this.parent != null && !this.parent.isShouldStopRequired)
    +  }
    +
    +  /**
    +   * Set to false if this plan consumes all rows produced by children but doesn't output row
    +   * to buffer by calling append(), so the children don't require shouldStop()
    +   * in the loop of producing rows.
    +   */
    +  protected def shouldStopRequired: Boolean = true
    --- End diff --
    
    Thank you for your comment. I overlooked Sort. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    LGTM merging to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74094/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103862311
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala ---
    @@ -434,6 +434,17 @@ case class RangeExec(range: org.apache.spark.sql.catalyst.plans.logical.Range)
         val input = ctx.freshName("input")
         // Right now, Range is only used when there is one upstream.
         ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];")
    +
    +    val localIdx = ctx.freshName("localIdx")
    +    val localEnd = ctx.freshName("localEnd")
    +    val range = ctx.freshName("range")
    +    // we need to place consume() before calling isShouldStopRequired
    --- End diff --
    
    Thank you, done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73904/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r104666030
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,21 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /**
    +   * For optimization to suppress shouldStop() in a loop of WholeStageCodegen.
    +   * Returning true means we need to insert shouldStop() into the loop producing rows, if any.
    +   */
    +  def isShouldStopRequired: Boolean = {
    +    return shouldStopRequired && !(this.parent != null && !this.parent.isShouldStopRequired)
    --- End diff --
    
    thanks, done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    **[Test build #73689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73689/testReport)** for PR 17122 at commit [`47f405c`](https://github.com/apache/spark/commit/47f405c32ffac9b0356050c0d6bbb8c0ea5e0f51).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    **[Test build #73904 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73904/testReport)** for PR 17122 at commit [`c2f7939`](https://github.com/apache/spark/commit/c2f79390f25fe1b4bad11603385fe464505178a4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103862282
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +207,13 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /* for optimization */
    +  var shouldStopRequired: Boolean = false
    --- End diff --
    
    Sure, done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103860938
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +207,13 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /* for optimization */
    --- End diff --
    
    Deserve better comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r104468446
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,21 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /**
    +   * For optimization to suppress shouldStop() in a loop of WholeStageCodegen.
    +   * Returning true means we need to insert shouldStop() into the loop producing rows, if any.
    +   */
    +  def isShouldStopRequired: Boolean = {
    +    return shouldStopRequired && !(this.parent != null && !this.parent.isShouldStopRequired)
    +  }
    +
    +  /**
    +   * Set to false if this plan consumes all rows produced by children but doesn't output row
    +   * to buffer by calling append(), so the children don't require shouldStop()
    +   * in the loop of producing rows.
    +   */
    +  protected def shouldStopRequired: Boolean = true
    --- End diff --
    
    How would sort be improved? Sort is very expensive operation, so it dominates the runtime of the job. The only possible improvement here is that you could avoid sorting with range (assuming that we do not overflow).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103923150
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,18 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /**
    +   * for optimization to suppress shouldStop() in a loop of WholeStageCodegen
    +   *
    +   * isShouldStopRequired: require to insert shouldStop() into the loop if true
    +   */
    +  def isShouldStopRequired: Boolean = {
    +    shouldStopRequired || (this.parent != null && this.parent.isShouldStopRequired)
    +  }
    +
    +  // set true if doConsume() inserts append() method that requires shouldStop() in the loop
    +  protected var shouldStopRequired: Boolean = false
    --- End diff --
    
    @hvanhovell We can do it technically. It looks simple. On the other hand, we have to maintain an immutable var carefully by investigating whether `append()` is required. In particular, we would add a new CodegenSupport-related class.
    In contrast, the current approach is easy to maintain the var. When we would add a new CodegenSupport-related class, it is unnecessary to carefully investigate it.
    
    This is a trade-off between simplicity and maintainability. What do you think? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17122


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    **[Test build #73745 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73745/testReport)** for PR 17122 at commit [`6697928`](https://github.com/apache/spark/commit/6697928e4ff8cf93c4b63c9b6e4b18bec4a2f87a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103865206
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,16 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /*
    +   * for optimization to suppress shouldStop() in a loop of WholeStageCodegen
    +   */
    +  // true: require to insert shouldStop() into a loop
    +  protected var shouldStopRequired: Boolean = false
    +
    --- End diff --
    
    ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103840075
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala ---
    @@ -442,11 +453,15 @@ case class RangeExec(range: org.apache.spark.sql.catalyst.plans.logical.Range)
           | }
           |
           | while (true) {
    -      |   while ($number != $batchEnd) {
    -      |     long $value = $number;
    -      |     $number += ${step}L;
    -      |     ${consume(ctx, Seq(ev))}
    -      |     if (shouldStop()) return;
    +      |   long $range = $batchEnd - $number;
    +      |   if ($range != 0L) {
    +      |     int $localEnd = (int)($range / ${step}L);
    +      |     for (int $localIdx = 0; $localIdx < $localEnd; $localIdx++) {
    +      |       long $value = ((long)$localIdx * ${step}L) + $number;
    +      |       $body
    +      |       $shouldStop
    --- End diff --
    
    oh. nvm. the outer-most `WholeStageCodegenExec`'s `shouldStopRequired` is true.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103862423
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,16 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /*
    +   * for optimization to suppress shouldStop() in a loop of WholeStageCodegen
    +   */
    +  // true: require to insert shouldStop() into a loop
    --- End diff --
    
    ??


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103926506
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,18 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /**
    +   * for optimization to suppress shouldStop() in a loop of WholeStageCodegen
    +   *
    +   * isShouldStopRequired: require to insert shouldStop() into the loop if true
    +   */
    +  def isShouldStopRequired: Boolean = {
    +    shouldStopRequired || (this.parent != null && this.parent.isShouldStopRequired)
    +  }
    +
    +  // set true if doConsume() inserts append() method that requires shouldStop() in the loop
    +  protected var shouldStopRequired: Boolean = false
    --- End diff --
    
    I would +1 for this. That is part of reason why I said it complicates the logic in previous comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    I would +1 for this. That is part of reason why I said it complicates the
    logic in previous comment.
    
    On Thu, Mar 2, 2017 at 9:40 PM, Herman van Hovell <no...@github.com>
    wrote:
    
    > *@hvanhovell* commented on this pull request.
    > ------------------------------
    >
    > In sql/core/src/main/scala/org/apache/spark/sql/execution/
    > WholeStageCodegenExec.scala
    > <https://github.com/apache/spark/pull/17122#discussion_r103925096>:
    >
    > > @@ -206,6 +206,18 @@ trait CodegenSupport extends SparkPlan {
    >    def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
    >      throw new UnsupportedOperationException
    >    }
    > +
    > +  /**
    > +   * for optimization to suppress shouldStop() in a loop of WholeStageCodegen
    > +   *
    > +   * isShouldStopRequired: require to insert shouldStop() into the loop if true
    > +   */
    > +  def isShouldStopRequired: Boolean = {
    > +    shouldStopRequired || (this.parent != null && this.parent.isShouldStopRequired)
    > +  }
    > +
    > +  // set true if doConsume() inserts append() method that requires shouldStop() in the loop
    > +  protected var shouldStopRequired: Boolean = false
    >
    > We spend quite a bit of time debugging issues caused by poorly managed
    > mutable vars in code generation. So I'd rather avoid it.
    >
    > \u2014
    > You are receiving this because you commented.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/17122#discussion_r103925096>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe-auth/AAEM9zZCqOP-iZkRPr4hlkVp-aiXlML5ks5rhsbGgaJpZM4MPuc1>
    > .
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    **[Test build #73702 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73702/testReport)** for PR 17122 at commit [`6913b45`](https://github.com/apache/spark/commit/6913b45362a36d0d197a64a6dc18e859fbf7de07).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103861241
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +207,13 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /* for optimization */
    +  var shouldStopRequired: Boolean = false
    --- End diff --
    
    Please add `protected`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r104165504
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,21 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /**
    +   * For optimization to suppress shouldStop() in a loop of WholeStageCodegen.
    +   * Returning true means we need to insert shouldStop() into the loop producing rows, if any.
    +   */
    +  def isShouldStopRequired: Boolean = {
    +    return shouldStopRequired && !(this.parent != null && !this.parent.isShouldStopRequired)
    +  }
    +
    +  /**
    +   * Set to false if this plan consumes all rows produced by children but doesn't output row
    +   * to buffer by calling append(), so the children don't require shouldStop()
    +   * in the loop of producing rows.
    +   */
    +  protected def shouldStopRequired: Boolean = true
    --- End diff --
    
    Should this be set to true for all blocking operators? Sort for instance?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73730/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    **[Test build #73702 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73702/testReport)** for PR 17122 at commit [`6913b45`](https://github.com/apache/spark/commit/6913b45362a36d0d197a64a6dc18e859fbf7de07).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103837174
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala ---
    @@ -434,6 +434,17 @@ case class RangeExec(range: org.apache.spark.sql.catalyst.plans.logical.Range)
         val input = ctx.freshName("input")
         // Right now, Range is only used when there is one upstream.
         ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];")
    +
    +    val localIdx = ctx.freshName("localIdx")
    +    val localEnd = ctx.freshName("localEnd")
    +    val range = ctx.freshName("range")
    +    // we need to place consume() before calling isShouldStopRequired
    +    val body = consume(ctx, Seq(ev))
    +    val shouldStop = if (isShouldStopRequired) {
    --- End diff --
    
    `isShouldStopRequired` complicates the logic. Is it necessary? How much improvement it brings?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103925096
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,18 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /**
    +   * for optimization to suppress shouldStop() in a loop of WholeStageCodegen
    +   *
    +   * isShouldStopRequired: require to insert shouldStop() into the loop if true
    +   */
    +  def isShouldStopRequired: Boolean = {
    +    shouldStopRequired || (this.parent != null && this.parent.isShouldStopRequired)
    +  }
    +
    +  // set true if doConsume() inserts append() method that requires shouldStop() in the loop
    +  protected var shouldStopRequired: Boolean = false
    --- End diff --
    
    We spend quite a bit of time debugging issues caused by poorly managed mutable vars in code generation. So I'd rather avoid it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r104472230
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,21 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /**
    +   * For optimization to suppress shouldStop() in a loop of WholeStageCodegen.
    +   * Returning true means we need to insert shouldStop() into the loop producing rows, if any.
    +   */
    +  def isShouldStopRequired: Boolean = {
    +    return shouldStopRequired && !(this.parent != null && !this.parent.isShouldStopRequired)
    +  }
    +
    +  /**
    +   * Set to false if this plan consumes all rows produced by children but doesn't output row
    +   * to buffer by calling append(), so the children don't require shouldStop()
    +   * in the loop of producing rows.
    +   */
    +  protected def shouldStopRequired: Boolean = true
    --- End diff --
    
    I agree with you. I just confirmed that this PR is not effective for Sort.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r104078113
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,18 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /**
    --- End diff --
    
    Suggestion for the comment:
    
        /**
         * For optimization to suppress `shouldStop()` in a loop of WholeStageCodegen.
         * Returning true means we need to insert `shouldStop()` into the loop producing rows, if any.
         */


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73702/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103878480
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,18 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /**
    +   * for optimization to suppress shouldStop() in a loop of WholeStageCodegen
    +   *
    +   * isShouldStopRequired: require to insert shouldStop() into the loop if true
    +   */
    +  def isShouldStopRequired: Boolean = {
    +    shouldStopRequired || (this.parent != null && this.parent.isShouldStopRequired)
    +  }
    +
    +  // set true if doConsume() inserts append() method that requires shouldStop() in the loop
    +  protected var shouldStopRequired: Boolean = false
    --- End diff --
    
    Do you think it is possible to do this without a mutable var? Code generation has way to much mutable state as it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    **[Test build #73689 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73689/testReport)** for PR 17122 at commit [`47f405c`](https://github.com/apache/spark/commit/47f405c32ffac9b0356050c0d6bbb8c0ea5e0f51).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    **[Test build #74094 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74094/testReport)** for PR 17122 at commit [`49d02e4`](https://github.com/apache/spark/commit/49d02e444e43e739f8814f41a2f769cf094e3a71).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103978915
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,18 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /**
    +   * for optimization to suppress shouldStop() in a loop of WholeStageCodegen
    +   *
    +   * isShouldStopRequired: require to insert shouldStop() into the loop if true
    +   */
    +  def isShouldStopRequired: Boolean = {
    +    shouldStopRequired || (this.parent != null && this.parent.isShouldStopRequired)
    +  }
    +
    +  // set true if doConsume() inserts append() method that requires shouldStop() in the loop
    +  protected var shouldStopRequired: Boolean = false
    --- End diff --
    
    I updated to use an immutable variable. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103927340
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,18 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /**
    +   * for optimization to suppress shouldStop() in a loop of WholeStageCodegen
    +   *
    +   * isShouldStopRequired: require to insert shouldStop() into the loop if true
    +   */
    +  def isShouldStopRequired: Boolean = {
    +    shouldStopRequired || (this.parent != null && this.parent.isShouldStopRequired)
    +  }
    +
    +  // set true if doConsume() inserts append() method that requires shouldStop() in the loop
    +  protected var shouldStopRequired: Boolean = false
    --- End diff --
    
    I see. I will rewrite this using immutable var.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    **[Test build #73898 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73898/testReport)** for PR 17122 at commit [`e6740a5`](https://github.com/apache/spark/commit/e6740a5b72662028af043dc6164a99f8fce34fdd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103861360
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala ---
    @@ -434,6 +434,17 @@ case class RangeExec(range: org.apache.spark.sql.catalyst.plans.logical.Range)
         val input = ctx.freshName("input")
         // Right now, Range is only used when there is one upstream.
         ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];")
    +
    +    val localIdx = ctx.freshName("localIdx")
    +    val localEnd = ctx.freshName("localEnd")
    +    val range = ctx.freshName("range")
    +    // we need to place consume() before calling isShouldStopRequired
    --- End diff --
    
    Better to describe the reason that consume() may modify `shouldStopRequired`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73898/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103860895
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -77,6 +77,7 @@ trait CodegenSupport extends SparkPlan {
        */
       final def produce(ctx: CodegenContext, parent: CodegenSupport): String = executeQuery {
         this.parent = parent
    +
    --- End diff --
    
    extra space.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103858494
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -77,6 +77,10 @@ trait CodegenSupport extends SparkPlan {
        */
       final def produce(ctx: CodegenContext, parent: CodegenSupport): String = executeQuery {
         this.parent = parent
    +
    +    // to track the existence of apply() call in the current produce-consume cycle
    +    // if apply is not called (e.g. in aggregation), we can skip shoudStop in the inner-most loop
    +    parent.shouldStopRequired = false
    --- End diff --
    
    I wanted to ensure `produce()` starts with `parent.shouldStopRequired = false`. This is because I am afraid other produce-consume may set true into `shouldStopRequired` if we have more than one-produce-consume in one parent.
    However, in most of cases, it would not happen. For the simplicity, I eliminated this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73689/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r104090569
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,18 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /**
    --- End diff --
    
    Thanks, done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103731873
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +210,15 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /* for optimization */
    +  var shouldStopRequired: Boolean = false
    +
    +  def isShouldStopRequired: Boolean = {
    +    if (shouldStopRequired) return true
    --- End diff --
    
    Can you just write `shouldStopRequired || (this.parent != null && this.parent.isShouldStopRequired)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r104076198
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,18 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /**
    +   * for optimization to suppress shouldStop() in a loop of WholeStageCodegen
    +   *
    +   * isShouldStopRequired: require to insert shouldStop() into the loop if true
    +   */
    +  def isShouldStopRequired: Boolean = {
    +    return shouldStopRequired && !(this.parent != null && !this.parent.isShouldStopRequired)
    --- End diff --
    
    Is this better to understand?
    
        def isShouldStopRequired: Boolean = {
          assert(this.parent != null)
          shouldStopRequired && this.parent.isShouldStopRequired
        }
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    **[Test build #73781 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73781/testReport)** for PR 17122 at commit [`8704083`](https://github.com/apache/spark/commit/8704083a563732264beabef37b7a61cc38e9d9b4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    **[Test build #74094 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74094/testReport)** for PR 17122 at commit [`49d02e4`](https://github.com/apache/spark/commit/49d02e444e43e739f8814f41a2f769cf094e3a71).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103862749
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,16 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /*
    +   * for optimization to suppress shouldStop() in a loop of WholeStageCodegen
    +   */
    +  // true: require to insert shouldStop() into a loop
    --- End diff --
    
    ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103841503
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
    @@ -69,6 +69,7 @@ trait BaseLimitExec extends UnaryExecNode with CodegenSupport {
       override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         val stopEarly = ctx.freshName("stopEarly")
         ctx.addMutableState("boolean", stopEarly, s"$stopEarly = false;")
    +    shouldStopRequired = true // loop may break early even without append in loop body
    --- End diff --
    
    If this `Limit` is the parent of an `Aggregate`, the final `shouldStopRequired` is true. But actually we can skip the check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103863351
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,16 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /*
    +   * for optimization to suppress shouldStop() in a loop of WholeStageCodegen
    +   */
    +  // true: require to insert shouldStop() into a loop
    --- End diff --
    
    Btw, the usual style is:
    
         /**
          * ....
          * 
          */


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    **[Test build #73781 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73781/testReport)** for PR 17122 at commit [`8704083`](https://github.com/apache/spark/commit/8704083a563732264beabef37b7a61cc38e9d9b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    This optimization looks good and the improvement is great. I have some comments regarding the changes of `shouldStopRequired`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103865202
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,16 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /*
    +   * for optimization to suppress shouldStop() in a loop of WholeStageCodegen
    +   */
    +  // true: require to insert shouldStop() into a loop
    --- End diff --
    
    Updated comments around here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103838737
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -77,6 +77,10 @@ trait CodegenSupport extends SparkPlan {
        */
       final def produce(ctx: CodegenContext, parent: CodegenSupport): String = executeQuery {
         this.parent = parent
    +
    +    // to track the existence of apply() call in the current produce-consume cycle
    +    // if apply is not called (e.g. in aggregation), we can skip shoudStop in the inner-most loop
    +    parent.shouldStopRequired = false
    --- End diff --
    
    Do we need this? The default value of `shouldStopRequired` is already false.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    LGTM, except the comment for `isShouldStopRequired` condition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    **[Test build #73734 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73734/testReport)** for PR 17122 at commit [`9528ccc`](https://github.com/apache/spark/commit/9528ccc2d63d8c657d74f455dd2589d8e883d51c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r104077831
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,18 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /**
    +   * for optimization to suppress shouldStop() in a loop of WholeStageCodegen
    +   *
    +   * isShouldStopRequired: require to insert shouldStop() into the loop if true
    +   */
    +  def isShouldStopRequired: Boolean = {
    +    return shouldStopRequired && !(this.parent != null && !this.parent.isShouldStopRequired)
    +  }
    +
    +  // set false if doConsume() does not insert append() that requires shouldStop() in the loop
    --- End diff --
    
    Suggestion for the comment:
    
        /**
         * Set to false if this plan consumes all rows produced by children but doesn't output row to buffer
         * by calling `append()`, so the children don't require `shouldStop()` in the loop of producing rows.
         */


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r104308330
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,21 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /**
    +   * For optimization to suppress shouldStop() in a loop of WholeStageCodegen.
    +   * Returning true means we need to insert shouldStop() into the loop producing rows, if any.
    +   */
    +  def isShouldStopRequired: Boolean = {
    +    return shouldStopRequired && !(this.parent != null && !this.parent.isShouldStopRequired)
    +  }
    +
    +  /**
    +   * Set to false if this plan consumes all rows produced by children but doesn't output row
    +   * to buffer by calling append(), so the children don't require shouldStop()
    +   * in the loop of producing rows.
    +   */
    +  protected def shouldStopRequired: Boolean = true
    --- End diff --
    
    Curious about the performance improvement on Sort.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103756297
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +210,15 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /* for optimization */
    +  var shouldStopRequired: Boolean = false
    +
    +  def isShouldStopRequired: Boolean = {
    +    if (shouldStopRequired) return true
    --- End diff --
    
    Thank you, it looks better. Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    **[Test build #73813 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73813/testReport)** for PR 17122 at commit [`7f095c0`](https://github.com/apache/spark/commit/7f095c0bdae1ff15859bec399fdd705bff379be0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103862946
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,16 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /*
    +   * for optimization to suppress shouldStop() in a loop of WholeStageCodegen
    +   */
    +  // true: require to insert shouldStop() into a loop
    --- End diff --
    
    Your comment style looks weird. Please put `true...` in the /*... */


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    LGTM cc @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103857751
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala ---
    @@ -434,6 +434,17 @@ case class RangeExec(range: org.apache.spark.sql.catalyst.plans.logical.Range)
         val input = ctx.freshName("input")
         // Right now, Range is only used when there is one upstream.
         ctx.addMutableState("scala.collection.Iterator", input, s"$input = inputs[0];")
    +
    +    val localIdx = ctx.freshName("localIdx")
    +    val localEnd = ctx.freshName("localEnd")
    +    val range = ctx.freshName("range")
    +    // we need to place consume() before calling isShouldStopRequired
    +    val body = consume(ctx, Seq(ev))
    +    val shouldStop = if (isShouldStopRequired) {
    --- End diff --
    
    I think that `isShouldStopRequired` is simple logic. It just checks whether `shouldStopRequired` or parents `shouldStopRequired` is true
    
    There are two reasons why `isShouldStopRequired` is necessary.
    1. The improvement is largely degraded from 7.6x to 5.5x without `isShouldStopRequired`
    2. We may miss some opportunities to enable compiler optimizations since the size of loop body would be increased without `isShouldStopRequired`. This is because a JIT compiler has a threshold of loop body size to apply some loop optimizations such as loop unrolling.
    
    ```
    OpenJDK 64-Bit Server VM 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14 on Linux 4.4.0-47-generic
    Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
    cnt:                                     Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    cnt                                            247 /  289       4340.6           0.2       1.0X
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73734/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r104592747
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,21 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /**
    +   * For optimization to suppress shouldStop() in a loop of WholeStageCodegen.
    +   * Returning true means we need to insert shouldStop() into the loop producing rows, if any.
    +   */
    +  def isShouldStopRequired: Boolean = {
    +    return shouldStopRequired && !(this.parent != null && !this.parent.isShouldStopRequired)
    --- End diff --
    
    I think `shouldStopRequired && (this.parent == null || this.parent.isShouldStopRequired)` is better to understand.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r104467009
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,21 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /**
    +   * For optimization to suppress shouldStop() in a loop of WholeStageCodegen.
    +   * Returning true means we need to insert shouldStop() into the loop producing rows, if any.
    +   */
    +  def isShouldStopRequired: Boolean = {
    +    return shouldStopRequired && !(this.parent != null && !this.parent.isShouldStopRequired)
    +  }
    +
    +  /**
    +   * Set to false if this plan consumes all rows produced by children but doesn't output row
    +   * to buffer by calling append(), so the children don't require shouldStop()
    +   * in the loop of producing rows.
    +   */
    +  protected def shouldStopRequired: Boolean = true
    --- End diff --
    
    I cannot see performance improvement on Sort. I think there are two reasons for this result.
    
    One is that the loop body is too large. At the inner-most loop, a [`insertRow` method](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java#L107-L120) is called. I think that the size of this method is too large to facilitate loop optimizations.
    
    The other is that the hotspot method is not here. I guess that the hotspot method may be  `sort()` at line 154.
    
    Here is the generated code.
    ```java
    /* 114 */     while (true) {
    /* 115 */       long range_range = range_batchEnd - range_number;
    /* 116 */       if (range_range != 0L) {
    /* 117 */         int range_localEnd = (int)(range_range / -1L);
    /* 118 */         for (int range_localIdx = 0; range_localIdx < range_localEnd; range_localIdx++) {
    /* 119 */           long range_value = ((long)range_localIdx * -1L) + range_number;
    /* 120 */
    /* 121 */           range_rowWriter.write(0, range_value);
    /* 122 */           sort_sorter.insertRow((UnsafeRow)range_result);
    /* 123 */
    /* 124 */           // shouldStop check is eliminated
    /* 125 */         }
    /* 126 */         range_number = range_batchEnd;
    /* 127 */       }
    /* 128 */
    /* 129 */       if (range_taskContext.isInterrupted()) {
    /* 130 */         throw new TaskKilledException();
    /* 131 */       }
    /* 132 */
    /* 133 */       long range_nextBatchTodo;
    /* 134 */       if (range_numElementsTodo > 1000L) {
    /* 135 */         range_nextBatchTodo = 1000L;
    /* 136 */         range_numElementsTodo -= 1000L;
    /* 137 */       } else {
    /* 138 */         range_nextBatchTodo = range_numElementsTodo;
    /* 139 */         range_numElementsTodo = 0;
    /* 140 */         if (range_nextBatchTodo == 0) break;
    /* 141 */       }
    /* 142 */       range_numOutputRows.add(range_nextBatchTodo);
    /* 143 */       range_inputMetrics.incRecordsRead(range_nextBatchTodo);
    /* 144 */
    /* 145 */       range_batchEnd += range_nextBatchTodo * -1L;
    /* 146 */     }
    /* 147 */
    /* 148 */   }
    /* 149 */
    /* 150 */   protected void processNext() throws java.io.IOException {
    /* 151 */     if (sort_needToSort) {
    /* 152 */       long sort_spillSizeBefore = sort_metrics.memoryBytesSpilled();
    /* 153 */       sort_addToSorter();
    /* 154 */       sort_sortedIter = sort_sorter.sort();
    /* 155 */       sort_sortTime.add(sort_sorter.getSortTimeNanos() / 1000000);
    /* 156 */       sort_peakMemory.add(sort_sorter.getPeakMemoryUsage());
    /* 157 */       sort_spillSize.add(sort_metrics.memoryBytesSpilled() - sort_spillSizeBefore);
    /* 158 */       sort_metrics.incPeakExecutionMemory(sort_sorter.getPeakMemoryUsage());
    /* 159 */       sort_needToSort = false;
    /* 160 */     }
    /* 161 */
    /* 162 */     while (sort_sortedIter.hasNext()) {
    /* 163 */       UnsafeRow sort_outputRow = (UnsafeRow)sort_sortedIter.next();
    /* 164 */
    /* 165 */       append(sort_outputRow);
    /* 166 */
    /* 167 */       if (shouldStop()) return;
    /* 168 */     }
    /* 169 */   }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103862272
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +207,13 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /* for optimization */
    --- End diff --
    
    I see. done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73813/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    **[Test build #73904 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73904/testReport)** for PR 17122 at commit [`c2f7939`](https://github.com/apache/spark/commit/c2f79390f25fe1b4bad11603385fe464505178a4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    **[Test build #73745 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73745/testReport)** for PR 17122 at commit [`6697928`](https://github.com/apache/spark/commit/6697928e4ff8cf93c4b63c9b6e4b18bec4a2f87a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    **[Test build #73730 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73730/testReport)** for PR 17122 at commit [`5ff8dca`](https://github.com/apache/spark/commit/5ff8dcae1bce0b553d4aefc563addc001e6a6691).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r104091814
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,18 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /**
    +   * for optimization to suppress shouldStop() in a loop of WholeStageCodegen
    +   *
    +   * isShouldStopRequired: require to insert shouldStop() into the loop if true
    +   */
    +  def isShouldStopRequired: Boolean = {
    +    return shouldStopRequired && !(this.parent != null && !this.parent.isShouldStopRequired)
    --- End diff --
    
    Thank you for your suggestion. However, it caused an assertion failure at `"SPARK-7150 range api"` in DataFrameRangeSuite.
    
    In the failure case, `isShouldStopRequired` is called in the class hierarchy by `parent`.
    `  RangeExec -> FilterExec -> WholeStageCodegenExec`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73781/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Have few suggestions about the code comments. Otherwise LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103862257
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -77,6 +77,7 @@ trait CodegenSupport extends SparkPlan {
        */
       final def produce(ctx: CodegenContext, parent: CodegenSupport): String = executeQuery {
         this.parent = parent
    +
    --- End diff --
    
    good catch. done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103863428
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,16 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /*
    +   * for optimization to suppress shouldStop() in a loop of WholeStageCodegen
    +   */
    +  // true: require to insert shouldStop() into a loop
    +  protected var shouldStopRequired: Boolean = false
    +
    --- End diff --
    
    Please add a simple comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    **[Test build #73898 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73898/testReport)** for PR 17122 at commit [`e6740a5`](https://github.com/apache/spark/commit/e6740a5b72662028af043dc6164a99f8fce34fdd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    **[Test build #73813 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73813/testReport)** for PR 17122 at commit [`7f095c0`](https://github.com/apache/spark/commit/7f095c0bdae1ff15859bec399fdd705bff379be0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r103839637
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala ---
    @@ -442,11 +453,15 @@ case class RangeExec(range: org.apache.spark.sql.catalyst.plans.logical.Range)
           | }
           |
           | while (true) {
    -      |   while ($number != $batchEnd) {
    -      |     long $value = $number;
    -      |     $number += ${step}L;
    -      |     ${consume(ctx, Seq(ev))}
    -      |     if (shouldStop()) return;
    +      |   long $range = $batchEnd - $number;
    +      |   if ($range != 0L) {
    +      |     int $localEnd = (int)($range / ${step}L);
    +      |     for (int $localIdx = 0; $localIdx < $localEnd; $localIdx++) {
    +      |       long $value = ((long)$localIdx * ${step}L) + $number;
    +      |       $body
    +      |       $shouldStop
    --- End diff --
    
    I think under most of cases, we need `shouldStop` check. But currently `shouldStopRequired` is false by default, so you will consume many additional rows now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17122#discussion_r104472986
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ---
    @@ -206,6 +206,21 @@ trait CodegenSupport extends SparkPlan {
       def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): String = {
         throw new UnsupportedOperationException
       }
    +
    +  /**
    +   * For optimization to suppress shouldStop() in a loop of WholeStageCodegen.
    +   * Returning true means we need to insert shouldStop() into the loop producing rows, if any.
    +   */
    +  def isShouldStopRequired: Boolean = {
    +    return shouldStopRequired && !(this.parent != null && !this.parent.isShouldStopRequired)
    +  }
    +
    +  /**
    +   * Set to false if this plan consumes all rows produced by children but doesn't output row
    +   * to buffer by calling append(), so the children don't require shouldStop()
    +   * in the loop of producing rows.
    +   */
    +  protected def shouldStopRequired: Boolean = true
    --- End diff --
    
    The only reason I mentioned sort is that there is no use in stopping early and that it would not be correct to do so. I was not really expecting any improvement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73737/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17122
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73745/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org