You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2016/01/31 07:59:06 UTC

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/10998

    [SPARK-12951] [SQL] support spilling in generated aggregate

    This PR add spilling support for generated TungstenAggregate.
    
    The changes will be covered by TungstenAggregationQueryWithControlledFallbackSuite

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark gen_spilling

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10998.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10998
    
----
commit 07c4119378402e1b245f53202923626e1e29b725
Author: Davies Liu <da...@databricks.com>
Date:   2016-01-31T06:52:39Z

    support spilling in generated aggregate

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-177461290
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10998#discussion_r51633120
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala ---
    @@ -287,39 +288,98 @@ case class TungstenAggregate(
         GenerateUnsafeRowJoiner.create(groupingKeySchema, bufferSchema)
       }
     
    -
       /**
    -    * Update peak execution memory, called in generated Java class.
    +    * Called by generated Java class to finish the aggregate and return a KVIterator.
         */
    -  def updatePeakMemory(hashMap: UnsafeFixedWidthAggregationMap): Unit = {
    +  def finishAggregate(
    +      hashMap: UnsafeFixedWidthAggregationMap,
    +      sorter: UnsafeKVExternalSorter): KVIterator[UnsafeRow, UnsafeRow] = {
    +
    +    // update peak execution memory
         val mapMemory = hashMap.getPeakMemoryUsedBytes
    +    val sorterMemory = Option(sorter).map(_.getPeakMemoryUsedBytes).getOrElse(0L)
    +    val peakMemory = Math.max(mapMemory, sorterMemory)
         val metrics = TaskContext.get().taskMetrics()
    -    metrics.incPeakExecutionMemory(mapMemory)
    -  }
    +    metrics.incPeakExecutionMemory(peakMemory)
     
    -  private def doProduceWithKeys(ctx: CodegenContext): String = {
    -    val initAgg = ctx.freshName("initAgg")
    -    ctx.addMutableState("boolean", initAgg, s"$initAgg = false;")
    +    if (sorter == null) {
    +      // not spilled
    +      return hashMap.iterator()
    +    }
     
    -    // create hashMap
    -    val thisPlan = ctx.addReferenceObj("plan", this)
    -    hashMapTerm = ctx.freshName("hashMap")
    -    val hashMapClassName = classOf[UnsafeFixedWidthAggregationMap].getName
    -    ctx.addMutableState(hashMapClassName, hashMapTerm, s"$hashMapTerm = $thisPlan.createHashMap();")
    +    // merge the final hashMap into sorter
    +    sorter.merge(hashMap.destructAndCreateExternalSorter())
    --- End diff --
    
    Here you call free after destructAndCreate() but in the other places you don't. Do you need to?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-178870708
  
    **[Test build #50598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50598/consoleFull)** for PR 10998 at commit [`0c52897`](https://github.com/apache/spark/commit/0c528970776afa03d6058cfc6f6beb30ff5be100).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-178816722
  
    This naming is getting really hard to follow. We should as a follow up make the variable names better. In this case for example, it should be clear if the value's are input values or intermediate result values.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10998#discussion_r51632943
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala ---
    @@ -426,14 +510,39 @@ case class TungstenAggregate(
           ctx.updateColumn(buffer, dt, i, ev, updateExpr(i).nullable)
         }
     
    +    val countTerm = ctx.freshName("count")
    +    ctx.addMutableState("int", countTerm, s"$countTerm = 0;")
    +    val checkFallback = if (testFallbackStartsAt.isDefined) {
    +      s"$countTerm < ${testFallbackStartsAt.get}"
    +    } else {
    +      "true"
    +    }
    +
    +    // We try to do hash map based in-memory aggregation first. If there is not enough memory (the
    +    // hash map will return null for new key), we spill the hash map to disk to free memory, then
    +    // continue to do in-memory aggregation and spilling until all the rows had been processed.
    +    // Finally, sort the spilled aggregate buffers by key, and merge them together for same key.
         s"""
          // generate grouping key
          ${keyCode.code}
    -     UnsafeRow $buffer = $hashMapTerm.getAggregationBufferFromUnsafeRow($key);
    +     UnsafeRow $buffer = null;
    +     if ($checkFallback) {
    --- End diff --
    
    I don't get this. You seem to do a look up twice and line (539) . Is that intentional? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-177442867
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-177461073
  
    **[Test build #50460 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50460/consoleFull)** for PR 10998 at commit [`07c4119`](https://github.com/apache/spark/commit/07c4119378402e1b245f53202923626e1e29b725).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-178169324
  
    Can you include the generated code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-178870796
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-177461291
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50460/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-178787172
  
    For query:
    ```
    sqlContext.range(values).selectExpr("(id & 65535) as k").groupBy("k").sum().collect()
    ```
    will generate:
    ```
    /* 001 */
    /* 002 */ public Object generate(Object[] references) {
    /* 003 */   return new GeneratedIterator(references);
    /* 004 */ }
    /* 005 */
    /* 006 */ class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
    /* 007 */
    /* 008 */   private Object[] references;
    /* 009 */   private boolean agg_initAgg0;
    /* 010 */   private org.apache.spark.sql.execution.aggregate.TungstenAggregate agg_plan1;
    /* 011 */   private org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap agg_hashMap2;
    /* 012 */   private org.apache.spark.sql.execution.UnsafeKVExternalSorter agg_sorter3;
    /* 013 */   private org.apache.spark.unsafe.KVIterator agg_mapIter4;
    /* 014 */   private boolean range_initRange6;
    /* 015 */   private long range_partitionEnd7;
    /* 016 */   private long range_number8;
    /* 017 */   private boolean range_overflow9;
    /* 018 */   private UnsafeRow agg_result19;
    /* 019 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder agg_holder20;
    /* 020 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter agg_rowWriter21;
    /* 021 */   private int agg_count38;
    /* 022 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowJoiner agg_unsafeRowJoiner41;
    /* 023 */
    /* 024 */   private void initRange(int idx) {
    /* 025 */     java.math.BigInteger index = java.math.BigInteger.valueOf(idx);
    /* 026 */     java.math.BigInteger numSlice = java.math.BigInteger.valueOf(1L);
    /* 027 */     java.math.BigInteger numElement = java.math.BigInteger.valueOf(20971520L);
    /* 028 */     java.math.BigInteger step = java.math.BigInteger.valueOf(1L);
    /* 029 */     java.math.BigInteger start = java.math.BigInteger.valueOf(0L);
    /* 030 */
    /* 031 */     java.math.BigInteger st = index.multiply(numElement).divide(numSlice).multiply(step).add(start);
    /* 032 */     if (st.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) {
    /* 033 */       range_number8 = Long.MAX_VALUE;
    /* 034 */     } else if (st.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) {
    /* 035 */       range_number8 = Long.MIN_VALUE;
    /* 036 */     } else {
    /* 037 */       range_number8 = st.longValue();
    /* 038 */     }
    /* 039 */
    /* 040 */     java.math.BigInteger end = index.add(java.math.BigInteger.ONE).multiply(numElement).divide(numSlice)
    /* 041 */     .multiply(step).add(start);
    /* 042 */     if (end.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) {
    /* 043 */       range_partitionEnd7 = Long.MAX_VALUE;
    /* 044 */     } else if (end.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) {
    /* 045 */       range_partitionEnd7 = Long.MIN_VALUE;
    /* 046 */     } else {
    /* 047 */       range_partitionEnd7 = end.longValue();
    /* 048 */     }
    /* 049 */   }
    /* 050 */
    /* 051 */
    /* 052 */   private void agg_doAggregateWithKeys5() throws java.io.IOException {
    /* 053 */
    /* 054 */     // initialize Range
    /* 055 */     if (!range_initRange6) {
    /* 056 */       range_initRange6 = true;
    /* 057 */       if (input.hasNext()) {
    /* 058 */         initRange(((InternalRow) input.next()).getInt(0));
    /* 059 */       } else {
    /* 060 */         return;
    /* 061 */       }
    /* 062 */     }
    /* 063 */
    /* 064 */     while (!range_overflow9 && range_number8 < range_partitionEnd7) {
    /* 065 */       long range_value10 = range_number8;
    /* 066 */       range_number8 += 1L;
    /* 067 */       if (range_number8 < range_value10 ^ 1L < 0) {
    /* 068 */         range_overflow9 = true;
    /* 069 */       }
    /* 070 */
    /* 071 */       /* (input[0, bigint] & 65535) */
    /* 072 */       /* input[0, bigint] */
    /* 073 */
    /* 074 */       /* 65535 */
    /* 075 */
    /* 076 */       long project_value12 = -1L;
    /* 077 */       project_value12 = range_value10 & 65535L;
    /* 078 */
    /* 079 */
    /* 080 */       // generate grouping key
    /* 081 */
    /* 082 */
    /* 083 */
    /* 084 */       /* input[0, bigint] */
    /* 085 */
    /* 086 */       agg_rowWriter21.write(0, project_value12);
    /* 087 */
    /* 088 */
    /* 089 */       UnsafeRow agg_aggBuffer23 = null;
    /* 090 */       if (true) {
    /* 091 */         agg_aggBuffer23 = agg_hashMap2.getAggregationBufferFromUnsafeRow(agg_result19);
    /* 092 */       }
    /* 093 */       if (agg_aggBuffer23 == null) {
    /* 094 */         if (agg_sorter3 == null) {
    /* 095 */           agg_sorter3 = agg_hashMap2.destructAndCreateExternalSorter();
    /* 096 */         } else {
    /* 097 */           agg_sorter3.merge(agg_hashMap2.destructAndCreateExternalSorter());
    /* 098 */         }
    /* 099 */         agg_count38 = 0;
    /* 100 */         agg_aggBuffer23 = agg_hashMap2.getAggregationBufferFromUnsafeRow(agg_result19);
    /* 101 */         if (agg_aggBuffer23 == null) {
    /* 102 */           // failed to allocate the first page
    /* 103 */           throw new OutOfMemoryError("No enough memory for aggregation");
    /* 104 */         }
    /* 105 */       }
    /* 106 */       agg_count38 += 1;
    /* 107 */
    /* 108 */       // evaluate aggregate function
    /* 109 */       /* (coalesce(input[0, bigint],cast(0 as bigint)) + cast(input[1, bigint] as bigint)) */
    /* 110 */       /* coalesce(input[0, bigint],cast(0 as bigint)) */
    /* 111 */       /* input[0, bigint] */
    /* 112 */       boolean agg_isNull28 = agg_aggBuffer23.isNullAt(0);
    /* 113 */       long agg_value29 = agg_isNull28 ? -1L : (agg_aggBuffer23.getLong(0));
    /* 114 */       boolean agg_isNull26 = agg_isNull28;
    /* 115 */       long agg_value27 = agg_value29;
    /* 116 */
    /* 117 */       if (agg_isNull26) {
    /* 118 */         /* cast(0 as bigint) */
    /* 119 */         /* 0 */
    /* 120 */
    /* 121 */         boolean agg_isNull30 = false;
    /* 122 */         long agg_value31 = -1L;
    /* 123 */         if (!false) {
    /* 124 */           agg_value31 = (long) 0;
    /* 125 */         }
    /* 126 */         if (!agg_isNull30) {
    /* 127 */           agg_isNull26 = false;
    /* 128 */           agg_value27 = agg_value31;
    /* 129 */         }
    /* 130 */       }
    /* 131 */       /* cast(input[1, bigint] as bigint) */
    /* 132 */       /* input[1, bigint] */
    /* 133 */
    /* 134 */       boolean agg_isNull34 = false;
    /* 135 */       long agg_value35 = -1L;
    /* 136 */       if (!false) {
    /* 137 */         agg_value35 = project_value12;
    /* 138 */       }
    /* 139 */       long agg_value25 = -1L;
    /* 140 */       agg_value25 = agg_value27 + agg_value35;
    /* 141 */       // update aggregate buffer
    /* 142 */       agg_aggBuffer23.setLong(0, agg_value25);
    /* 143 */
    /* 144 */
    /* 145 */     }
    /* 146 */
    /* 147 */
    /* 148 */     agg_mapIter4 = agg_plan1.finishAggregate(agg_hashMap2, agg_sorter3);
    /* 149 */   }
    /* 150 */
    /* 151 */
    /* 152 */   public GeneratedIterator(Object[] references) {
    /* 153 */     this.references = references;
    /* 154 */     agg_initAgg0 = false;
    /* 155 */     this.agg_plan1 = (org.apache.spark.sql.execution.aggregate.TungstenAggregate) references[0];
    /* 156 */     agg_hashMap2 = agg_plan1.createHashMap();
    /* 157 */
    /* 158 */
    /* 159 */     range_initRange6 = false;
    /* 160 */     range_partitionEnd7 = 0L;
    /* 161 */     range_number8 = 0L;
    /* 162 */     range_overflow9 = false;
    /* 163 */     agg_result19 = new UnsafeRow(1);
    /* 164 */     this.agg_holder20 = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_result19, 0);
    /* 165 */     this.agg_rowWriter21 = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_holder20, 1);
    /* 166 */     agg_count38 = 0;
    /* 167 */     agg_unsafeRowJoiner41 = agg_plan1.createUnsafeJoiner();
    /* 168 */   }
    /* 169 */
    /* 170 */   protected void processNext() throws java.io.IOException {
    /* 171 */
    /* 172 */     if (!agg_initAgg0) {
    /* 173 */       agg_initAgg0 = true;
    /* 174 */       agg_doAggregateWithKeys5();
    /* 175 */     }
    /* 176 */
    /* 177 */     // output the result
    /* 178 */     while (agg_mapIter4.next()) {
    /* 179 */       UnsafeRow agg_aggKey39 = (UnsafeRow) agg_mapIter4.getKey();
    /* 180 */       UnsafeRow agg_aggBuffer40 = (UnsafeRow) agg_mapIter4.getValue();
    /* 181 */
    /* 182 */       UnsafeRow agg_resultRow42 = agg_unsafeRowJoiner41.join(agg_aggKey39, agg_aggBuffer40);
    /* 183 */
    /* 184 */       currentRow = agg_resultRow42;
    /* 185 */       return;
    /* 186 */
    /* 187 */
    /* 188 */     }
    /* 189 */
    /* 190 */     agg_mapIter4.close();
    /* 191 */     if (agg_sorter3 == null) {
    /* 192 */       agg_hashMap2.free();
    /* 193 */     }
    /* 194 */
    /* 195 */   }
    /* 196 */ }
    /* 197 */
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-178870799
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50598/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-178167285
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50494/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-178166922
  
    **[Test build #50494 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50494/consoleFull)** for PR 10998 at commit [`02f2a25`](https://github.com/apache/spark/commit/02f2a25346edd718d9371955759478408e77382b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-178953808
  
    **[Test build #2497 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2497/consoleFull)** for PR 10998 at commit [`0c52897`](https://github.com/apache/spark/commit/0c528970776afa03d6058cfc6f6beb30ff5be100).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-178816082
  
    In the generated code, what is agg_count38 used for?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10998


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-178837831
  
    @davies If this debugging is generally useful to have, can comment it in the code and possibly update the variable to be more specific than "count". If this is a one off you used for this, let's remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10998#discussion_r51633281
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala ---
    @@ -426,14 +510,39 @@ case class TungstenAggregate(
           ctx.updateColumn(buffer, dt, i, ev, updateExpr(i).nullable)
         }
     
    +    val countTerm = ctx.freshName("count")
    +    ctx.addMutableState("int", countTerm, s"$countTerm = 0;")
    +    val checkFallback = if (testFallbackStartsAt.isDefined) {
    +      s"$countTerm < ${testFallbackStartsAt.get}"
    +    } else {
    +      "true"
    +    }
    +
    +    // We try to do hash map based in-memory aggregation first. If there is not enough memory (the
    +    // hash map will return null for new key), we spill the hash map to disk to free memory, then
    +    // continue to do in-memory aggregation and spilling until all the rows had been processed.
    +    // Finally, sort the spilled aggregate buffers by key, and merge them together for same key.
         s"""
          // generate grouping key
          ${keyCode.code}
    -     UnsafeRow $buffer = $hashMapTerm.getAggregationBufferFromUnsafeRow($key);
    +     UnsafeRow $buffer = null;
    +     if ($checkFallback) {
    --- End diff --
    
    Nvm. I get it now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-177446928
  
    **[Test build #50460 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50460/consoleFull)** for PR 10998 at commit [`07c4119`](https://github.com/apache/spark/commit/07c4119378402e1b245f53202923626e1e29b725).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-178935482
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-178820506
  
    @nongli agg_count38 is used for checking fallback (only for testing).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-177420351
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-178128143
  
    **[Test build #50494 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50494/consoleFull)** for PR 10998 at commit [`02f2a25`](https://github.com/apache/spark/commit/02f2a25346edd718d9371955759478408e77382b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-178986524
  
    Merged into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10998#discussion_r51661898
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala ---
    @@ -426,14 +510,39 @@ case class TungstenAggregate(
           ctx.updateColumn(buffer, dt, i, ev, updateExpr(i).nullable)
         }
     
    +    val countTerm = ctx.freshName("count")
    +    ctx.addMutableState("int", countTerm, s"$countTerm = 0;")
    +    val checkFallback = if (testFallbackStartsAt.isDefined) {
    +      s"$countTerm < ${testFallbackStartsAt.get}"
    +    } else {
    +      "true"
    +    }
    +
    +    // We try to do hash map based in-memory aggregation first. If there is not enough memory (the
    +    // hash map will return null for new key), we spill the hash map to disk to free memory, then
    +    // continue to do in-memory aggregation and spilling until all the rows had been processed.
    +    // Finally, sort the spilled aggregate buffers by key, and merge them together for same key.
         s"""
          // generate grouping key
          ${keyCode.code}
    -     UnsafeRow $buffer = $hashMapTerm.getAggregationBufferFromUnsafeRow($key);
    +     UnsafeRow $buffer = null;
    +     if ($checkFallback) {
    --- End diff --
    
    Add comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-177416239
  
    Can you add some documentation in the code somewhere appropriate to explain the high level flow (e.g. when X happens, we switch to Y through Z).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-178857047
  
    **[Test build #50598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50598/consoleFull)** for PR 10998 at commit [`0c52897`](https://github.com/apache/spark/commit/0c528970776afa03d6058cfc6f6beb30ff5be100).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-178167283
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10998#discussion_r51633650
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala ---
    @@ -426,14 +510,39 @@ case class TungstenAggregate(
           ctx.updateColumn(buffer, dt, i, ev, updateExpr(i).nullable)
         }
     
    +    val countTerm = ctx.freshName("count")
    +    ctx.addMutableState("int", countTerm, s"$countTerm = 0;")
    +    val checkFallback = if (testFallbackStartsAt.isDefined) {
    +      s"$countTerm < ${testFallbackStartsAt.get}"
    +    } else {
    +      "true"
    +    }
    +
    +    // We try to do hash map based in-memory aggregation first. If there is not enough memory (the
    +    // hash map will return null for new key), we spill the hash map to disk to free memory, then
    +    // continue to do in-memory aggregation and spilling until all the rows had been processed.
    +    // Finally, sort the spilled aggregate buffers by key, and merge them together for same key.
         s"""
          // generate grouping key
          ${keyCode.code}
    -     UnsafeRow $buffer = $hashMapTerm.getAggregationBufferFromUnsafeRow($key);
    +     UnsafeRow $buffer = null;
    +     if ($checkFallback) {
    --- End diff --
    
    if it is confusing we should consider adding documentations


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-177420354
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50458/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12951] [SQL] support spilling in genera...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10998#issuecomment-178905941
  
    **[Test build #2497 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2497/consoleFull)** for PR 10998 at commit [`0c52897`](https://github.com/apache/spark/commit/0c528970776afa03d6058cfc6f6beb30ff5be100).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org