You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sameeragarwal <gi...@git.apache.org> on 2016/04/05 03:07:33 UTC

[GitHub] spark pull request: [WIP][SPARK-14394][SQL] Generate AggregateHash...

GitHub user sameeragarwal opened a pull request:

    https://github.com/apache/spark/pull/12161

    [WIP][SPARK-14394][SQL] Generate AggregateHashMap class during TungstenAggregate codegen

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sameeragarwal/spark tungsten-aggregate

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12161.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12161
    
----
commit a1fe9f83c182ea02b6c9a8825ba90f10a5e6d638
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-03-31T21:15:34Z

    Make ColumnarBatch.Row mutable

commit 2ea924397c45ad69e88b38c5a8a2ea9d7b926a64
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-03-31T21:15:34Z

    Make ColumnarBatch.Row mutable

commit 3c54cd0072bc7c3191419c2a9a379b9377941152
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-04-01T06:12:42Z

    insert hashmap

commit ec14a7f63106917aecf9cc4e372436cfe7f8ac52
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-04-01T18:50:36Z

    initial attempts

commit 2002131156909e42d0617e684a7f6fa373699d92
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-04-02T00:14:44Z

    CR

commit 85e3c908a86c8f4d3d994360f72ea64c64b8bc5b
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-04-04T18:47:37Z

    Merge branch 'mutable-row' of github.com:sameeragarwal/spark into tungsten-aggregate

commit 1f5a60fbc60436f0409cd4e7ec4b12937b6e4294
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-04-04T23:45:30Z

    codegened

commit 8a47e1ea9886a3389d7254da4391fa2446e74b40
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-04-05T00:48:33Z

    Generate codegened Hashmap

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206025288
  
    **[Test build #55036 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55036/consoleFull)** for PR 12161 at commit [`a31be48`](https://github.com/apache/spark/commit/a31be487e9f369c0eb30c4e22df85765867a3478).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12161#discussion_r58978723
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregateHashMap.scala ---
    @@ -0,0 +1,125 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.aggregate
    +
    +import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
    +import org.apache.spark.sql.types.StructType
    +
    +class TungstenAggregateHashMap(
    +    ctx: CodegenContext,
    +    generatedClassName: String,
    +    groupingKeySchema: StructType,
    +    bufferSchema: StructType) {
    +  val groupingKeys = groupingKeySchema.map(k => (k.dataType.typeName, ctx.freshName("key")))
    +  val bufferValues = bufferSchema.map(k => (k.dataType.typeName, ctx.freshName("value")))
    +  val groupingKeySignature = groupingKeys.map(_.productIterator.toList.mkString(" ")).mkString(", ")
    +
    +  def generate(): String = {
    +    s"""
    +       |public class $generatedClassName {
    +       |${initializeAggregateHashMap()}
    +       |
    +       |${generateFindOrInsert()}
    +       |
    +       |${generateEquals()}
    +       |
    +       |${generateHashFunction()}
    +       |}
    +     """.stripMargin
    +  }
    +
    +  def initializeAggregateHashMap(): String = {
    +    val generatedSchema: String =
    +      s"""
    +         |new org.apache.spark.sql.types.StructType()
    +         |${(groupingKeySchema ++ bufferSchema).map(key =>
    +          s""".add("${key.name}", org.apache.spark.sql.types.DataTypes.${key.dataType})""")
    +          .mkString("\n")};
    +      """.stripMargin
    +
    +    s"""
    +       |  private org.apache.spark.sql.execution.vectorized.ColumnarBatch batch;
    +       |  private int[] buckets;
    +       |  private int numBuckets;
    +       |  private int maxSteps;
    +       |  private int numRows = 0;
    +       |  private org.apache.spark.sql.types.StructType schema = $generatedSchema
    +       |
    +       |  public $generatedClassName(int capacity, double loadFactor, int maxSteps) {
    +       |    assert (capacity > 0 && ((capacity & (capacity - 1)) == 0));
    +       |    this.maxSteps = maxSteps;
    +       |    numBuckets = (int) (capacity / loadFactor);
    +       |    batch = org.apache.spark.sql.execution.vectorized.ColumnarBatch.allocate(schema,
    +       |      org.apache.spark.memory.MemoryMode.ON_HEAP, capacity);
    +       |    buckets = new int[numBuckets];
    +       |    java.util.Arrays.fill(buckets, -1);
    +       |  }
    +       |
    +       |  public $generatedClassName() {
    +       |    new $generatedClassName(1 << 16, 0.25, 5);
    +       |  }
    +     """.stripMargin
    +  }
    +
    +  def generateHashFunction(): String = {
    --- End diff --
    
    one thing that might be useful is to put the generated code actually in as comments. 
    
    same for the generateEquals and generateFindOrInsert


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207267713
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12161#discussion_r59043603
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregateHashMap.scala ---
    @@ -0,0 +1,132 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.aggregate
    +
    +import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
    +import org.apache.spark.sql.types.StructType
    +
    +class TungstenAggregateHashMap(
    +    ctx: CodegenContext,
    +    generatedClassName: String,
    +    groupingKeySchema: StructType,
    +    bufferSchema: StructType) {
    +  val groupingKeys = groupingKeySchema.map(key => (key.dataType.typeName, ctx.freshName("key")))
    +  val bufferValues = bufferSchema.map(key => (ctx.freshName("value"), key.dataType.typeName))
    +  val groupingKeySignature = groupingKeys.map(_.productIterator.toList.mkString(" ")).mkString(", ")
    +
    +  def generate(): String = {
    +    s"""
    +       |public class $generatedClassName {
    +       |${initializeAggregateHashMap()}
    +       |
    +       |${generateFindOrInsert()}
    +       |
    +       |${generateEquals()}
    +       |
    +       |${generateHashFunction()}
    +       |}
    +     """.stripMargin
    +  }
    +
    +  def initializeAggregateHashMap(): String = {
    +    val generatedSchema: String =
    --- End diff --
    
    that was my initial thought too but this generated class only works for one schema due to the specialized equals/hash/find signatures. It's not particularly useful to pass in a schema if only one works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207235417
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207236603
  
    **[Test build #55327 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55327/consoleFull)** for PR 12161 at commit [`eb8a020`](https://github.com/apache/spark/commit/eb8a020abb5521bd71fa5d683d8bb3d5857a0287).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207511433
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55354/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206026230
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55036/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207498660
  
    **[Test build #55354 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55354/consoleFull)** for PR 12161 at commit [`ec74328`](https://github.com/apache/spark/commit/ec74328ab73766481d3aa7e566fe592bbde747eb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206024191
  
    **[Test build #55034 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55034/consoleFull)** for PR 12161 at commit [`a31be48`](https://github.com/apache/spark/commit/a31be487e9f369c0eb30c4e22df85765867a3478).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class TungstenAggregateHashMap(`
      * `       |public class $generatedClassName `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207537480
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55371/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207159772
  
    **[Test build #55272 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55272/consoleFull)** for PR 12161 at commit [`071a900`](https://github.com/apache/spark/commit/071a90066eab9f672b561e7db0cab577bb9c38fe).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207321981
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207134229
  
    I think the old version makes more sense. The generated code only works for a particular schema so no reason to pass it in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207234721
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55318/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206275251
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55104/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207321884
  
    **[Test build #55336 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55336/consoleFull)** for PR 12161 at commit [`eb8a020`](https://github.com/apache/spark/commit/eb8a020abb5521bd71fa5d683d8bb3d5857a0287).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207146943
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12161#discussion_r58978809
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregateHashMap.scala ---
    @@ -0,0 +1,132 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.aggregate
    +
    +import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
    +import org.apache.spark.sql.types.StructType
    +
    +class TungstenAggregateHashMap(
    +    ctx: CodegenContext,
    +    generatedClassName: String,
    +    groupingKeySchema: StructType,
    +    bufferSchema: StructType) {
    +  val groupingKeys = groupingKeySchema.map(key => (key.dataType.typeName, ctx.freshName("key")))
    +  val bufferValues = bufferSchema.map(key => (ctx.freshName("value"), key.dataType.typeName))
    +  val groupingKeySignature = groupingKeys.map(_.productIterator.toList.mkString(" ")).mkString(", ")
    +
    +  def generate(): String = {
    +    s"""
    +       |public class $generatedClassName {
    +       |${initializeAggregateHashMap()}
    +       |
    +       |${generateFindOrInsert()}
    +       |
    +       |${generateEquals()}
    +       |
    +       |${generateHashFunction()}
    +       |}
    +     """.stripMargin
    +  }
    +
    +  def initializeAggregateHashMap(): String = {
    +    val generatedSchema: String =
    --- End diff --
    
    @nongli how come you asked him to revert to generated schema? it looks pretty weird to generate code to create the schema when it is already available outside codegen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12161#discussion_r58788712
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregateHashMap.scala ---
    @@ -0,0 +1,132 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.aggregate
    +
    +import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
    +import org.apache.spark.sql.types.StructType
    +
    +class TungstenAggregateHashMap(
    +    ctx: CodegenContext,
    +    generatedClassName: String,
    +    groupingKeySchema: StructType,
    +    bufferSchema: StructType) {
    +  val groupingKeys = groupingKeySchema.map(key => (key.dataType.typeName, ctx.freshName("key")))
    +  val bufferValues = bufferSchema.map(key => (ctx.freshName("value"), key.dataType.typeName))
    +  val groupingKeySignature = groupingKeys.map(_.productIterator.toList.mkString(" ")).mkString(", ")
    +
    +  def generate(): String = {
    +    s"""
    +       |public class $generatedClassName {
    +       |${initializeAggregateHashMap()}
    +       |
    +       |${generateFindOrInsert()}
    +       |
    +       |${generateEquals()}
    +       |
    +       |${generateHashFunction()}
    +       |}
    +     """.stripMargin
    +  }
    +
    +  def initializeAggregateHashMap(): String = {
    +    val generatedSchema: String =
    +      s"""
    +         |new org.apache.spark.sql.types.StructType()
    +         |${(groupingKeySchema ++ bufferSchema).map(key =>
    +            s""".add("${key.name}", org.apache.spark.sql.types.DataTypes.${key.dataType})""")
    +            .mkString("\n")};
    +       """.stripMargin
    +
    +    s"""
    +       |  private org.apache.spark.sql.execution.vectorized.ColumnarBatch batch;
    +       |  private int[] buckets;
    +       |  private int numBuckets;
    +       |  private int maxSteps;
    +       |  private int numRows = 0;
    +       |  private org.apache.spark.sql.types.StructType schema = $generatedSchema
    +       |
    +       |  public $generatedClassName(int capacity, double loadFactor, int maxSteps) {
    +       |    assert (capacity > 0 && ((capacity & (capacity - 1)) == 0));
    +       |    this.maxSteps = maxSteps;
    +       |    numBuckets = (int) (capacity / loadFactor);
    +       |    batch = org.apache.spark.sql.execution.vectorized.ColumnarBatch.allocate(schema,
    +       |      org.apache.spark.memory.MemoryMode.ON_HEAP, capacity);
    +       |    buckets = new int[numBuckets];
    +       |    java.util.Arrays.fill(buckets, -1);
    +       |  }
    +       |
    +       |  public $generatedClassName() {
    +       |    new $generatedClassName(1 << 16, 0.25, 5);
    +       |  }
    +     """.stripMargin
    +  }
    +
    +  def generateHashFunction(): String = {
    +    s"""
    +       |// TODO: Improve this Hash Function
    +       |private long hash($groupingKeySignature) {
    +       |  return ${groupingKeys.map(_._2).mkString(" & ")};
    +       |}
    +     """.stripMargin
    +  }
    +
    +  def generateEquals(): String = {
    +    s"""
    +       |private boolean equals(int idx, $groupingKeySignature) {
    +       |  return ${groupingKeys.zipWithIndex.map(key =>
    +            s"batch.column(${key._2}).getLong(buckets[idx]) == ${key._1._2}").mkString(" && ")};
    +       |}
    +     """.stripMargin
    +  }
    +
    +  def generateFindOrInsert(): String = {
    +    s"""
    +       |public org.apache.spark.sql.execution.vectorized.ColumnarBatch.Row findOrInsert(${
    +          groupingKeySignature}) {
    +       |  int idx = find(${groupingKeys.map(_._2).mkString(", ")});
    +       |  if (idx != -1 && buckets[idx] == -1) {
    +       |    ${groupingKeys.zipWithIndex.map(key =>
    +              s"batch.column(${key._2}).putLong(numRows, ${key._1._2});").mkString("\n")}
    +       |    ${bufferValues.zipWithIndex.map(key =>
    +              s"batch.column(${groupingKeys.length + key._2}).putLong(numRows, 0);")
    +              .mkString("\n")}
    +       |    buckets[idx] = numRows++;
    +       |  }
    +       |  return batch.getRow(buckets[idx]);
    +       |}
    +       |
    +       |private int find($groupingKeySignature) {
    --- End diff --
    
    Let's simplify this. The generated code only needs findOrInsert() and doesn't need find.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207534882
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207511426
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206062012
  
    **[Test build #55068 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55068/consoleFull)** for PR 12161 at commit [`bd96657`](https://github.com/apache/spark/commit/bd96657854cc643547ddabd8efee6f645ee7a7ff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207321991
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55336/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207267628
  
    **[Test build #55327 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55327/consoleFull)** for PR 12161 at commit [`eb8a020`](https://github.com/apache/spark/commit/eb8a020abb5521bd71fa5d683d8bb3d5857a0287).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207579368
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12161#discussion_r58978492
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregateHashMap.scala ---
    @@ -0,0 +1,125 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.aggregate
    +
    +import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
    +import org.apache.spark.sql.types.StructType
    +
    +class TungstenAggregateHashMap(
    --- End diff --
    
    we should document how this thing works in the classdoc (i.e. explain the physical layout).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-14394][SQL] Generate AggregateHash...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-205573020
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54918/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207049242
  
    sorry -- I hadn't update the correct generated code in the PR description. Please let me know if this is OK or you still prefer the old version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206275246
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206026226
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207276393
  
    **[Test build #55336 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55336/consoleFull)** for PR 12161 at commit [`eb8a020`](https://github.com/apache/spark/commit/eb8a020abb5521bd71fa5d683d8bb3d5857a0287).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-14394][SQL] Generate AggregateHash...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-205995990
  
    Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206190571
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206023097
  
    **[Test build #55034 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55034/consoleFull)** for PR 12161 at commit [`a31be48`](https://github.com/apache/spark/commit/a31be487e9f369c0eb30c4e22df85765867a3478).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206040332
  
    **[Test build #55042 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55042/consoleFull)** for PR 12161 at commit [`a31be48`](https://github.com/apache/spark/commit/a31be487e9f369c0eb30c4e22df85765867a3478).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class TungstenAggregateHashMap(`
      * `       |public class $generatedClassName `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-14394][SQL] Generate AggregateHash...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-205995991
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55024/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207495473
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207550141
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207234697
  
    **[Test build #55318 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55318/consoleFull)** for PR 12161 at commit [`eb8a020`](https://github.com/apache/spark/commit/eb8a020abb5521bd71fa5d683d8bb3d5857a0287).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207495479
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55353/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207550406
  
    **[Test build #55380 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55380/consoleFull)** for PR 12161 at commit [`ec74328`](https://github.com/apache/spark/commit/ec74328ab73766481d3aa7e566fe592bbde747eb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207579373
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55380/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-14394][SQL] Generate AggregateHash...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-205572143
  
    **[Test build #54918 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54918/consoleFull)** for PR 12161 at commit [`8a47e1e`](https://github.com/apache/spark/commit/8a47e1ea9886a3389d7254da4391fa2446e74b40).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206190355
  
    **[Test build #55094 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55094/consoleFull)** for PR 12161 at commit [`e30d40d`](https://github.com/apache/spark/commit/e30d40db0c94b2121b6bbcb48ba1ebf7ac861246).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206190576
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55094/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207536033
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55369/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206032405
  
    **[Test build #55042 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55042/consoleFull)** for PR 12161 at commit [`a31be48`](https://github.com/apache/spark/commit/a31be487e9f369c0eb30c4e22df85765867a3478).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207511168
  
    **[Test build #55354 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55354/consoleFull)** for PR 12161 at commit [`ec74328`](https://github.com/apache/spark/commit/ec74328ab73766481d3aa7e566fe592bbde747eb).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207536205
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207234720
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12161#discussion_r58978685
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregateHashMap.scala ---
    @@ -0,0 +1,125 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.aggregate
    +
    +import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
    +import org.apache.spark.sql.types.StructType
    +
    +class TungstenAggregateHashMap(
    +    ctx: CodegenContext,
    +    generatedClassName: String,
    +    groupingKeySchema: StructType,
    +    bufferSchema: StructType) {
    +  val groupingKeys = groupingKeySchema.map(k => (k.dataType.typeName, ctx.freshName("key")))
    +  val bufferValues = bufferSchema.map(k => (k.dataType.typeName, ctx.freshName("value")))
    +  val groupingKeySignature = groupingKeys.map(_.productIterator.toList.mkString(" ")).mkString(", ")
    +
    +  def generate(): String = {
    +    s"""
    +       |public class $generatedClassName {
    +       |${initializeAggregateHashMap()}
    +       |
    +       |${generateFindOrInsert()}
    +       |
    +       |${generateEquals()}
    +       |
    +       |${generateHashFunction()}
    +       |}
    +     """.stripMargin
    +  }
    +
    +  def initializeAggregateHashMap(): String = {
    +    val generatedSchema: String =
    +      s"""
    +         |new org.apache.spark.sql.types.StructType()
    +         |${(groupingKeySchema ++ bufferSchema).map(key =>
    +          s""".add("${key.name}", org.apache.spark.sql.types.DataTypes.${key.dataType})""")
    +          .mkString("\n")};
    +      """.stripMargin
    +
    +    s"""
    +       |  private org.apache.spark.sql.execution.vectorized.ColumnarBatch batch;
    +       |  private int[] buckets;
    +       |  private int numBuckets;
    +       |  private int maxSteps;
    +       |  private int numRows = 0;
    +       |  private org.apache.spark.sql.types.StructType schema = $generatedSchema
    +       |
    +       |  public $generatedClassName(int capacity, double loadFactor, int maxSteps) {
    +       |    assert (capacity > 0 && ((capacity & (capacity - 1)) == 0));
    +       |    this.maxSteps = maxSteps;
    +       |    numBuckets = (int) (capacity / loadFactor);
    +       |    batch = org.apache.spark.sql.execution.vectorized.ColumnarBatch.allocate(schema,
    +       |      org.apache.spark.memory.MemoryMode.ON_HEAP, capacity);
    +       |    buckets = new int[numBuckets];
    +       |    java.util.Arrays.fill(buckets, -1);
    +       |  }
    +       |
    +       |  public $generatedClassName() {
    +       |    new $generatedClassName(1 << 16, 0.25, 5);
    +       |  }
    +     """.stripMargin
    +  }
    +
    +  def generateHashFunction(): String = {
    --- End diff --
    
    it'd be great to document the hash function (since it is more difficult to read the generated code)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206026222
  
    **[Test build #55036 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55036/consoleFull)** for PR 12161 at commit [`a31be48`](https://github.com/apache/spark/commit/a31be487e9f369c0eb30c4e22df85765867a3478).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class TungstenAggregateHashMap(`
      * `       |public class $generatedClassName `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206706738
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206080184
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55068/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206030967
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207136918
  
    **[Test build #55272 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55272/consoleFull)** for PR 12161 at commit [`071a900`](https://github.com/apache/spark/commit/071a90066eab9f672b561e7db0cab577bb9c38fe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12161


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207160035
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206691029
  
    **[Test build #55179 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55179/consoleFull)** for PR 12161 at commit [`cae66fd`](https://github.com/apache/spark/commit/cae66fd299cb770b2bef75d258947d7f1ecfe36e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-14394][SQL] Generate AggregateHash...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-205573015
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206706739
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55180/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207497421
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207218514
  
    **[Test build #55318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55318/consoleFull)** for PR 12161 at commit [`eb8a020`](https://github.com/apache/spark/commit/eb8a020abb5521bd71fa5d683d8bb3d5857a0287).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12161#discussion_r58981138
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregateHashMap.scala ---
    @@ -0,0 +1,125 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.aggregate
    +
    +import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
    +import org.apache.spark.sql.types.StructType
    +
    +class TungstenAggregateHashMap(
    --- End diff --
    
    Added docs, renames etc. The reason I made it a class was because there was a lot of shared state that'd otherwise have to be passed around in all the functions (`groupingKeys`, `bufferValues`, `groupingKeySignature` etc.).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206142648
  
    **[Test build #55094 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55094/consoleFull)** for PR 12161 at commit [`e30d40d`](https://github.com/apache/spark/commit/e30d40db0c94b2121b6bbcb48ba1ebf7ac861246).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206024935
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12161#discussion_r58978122
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregateHashMap.scala ---
    @@ -0,0 +1,132 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.aggregate
    +
    +import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
    +import org.apache.spark.sql.types.StructType
    +
    +class TungstenAggregateHashMap(
    +    ctx: CodegenContext,
    +    generatedClassName: String,
    +    groupingKeySchema: StructType,
    +    bufferSchema: StructType) {
    +  val groupingKeys = groupingKeySchema.map(key => (key.dataType.typeName, ctx.freshName("key")))
    +  val bufferValues = bufferSchema.map(key => (ctx.freshName("value"), key.dataType.typeName))
    +  val groupingKeySignature = groupingKeys.map(_.productIterator.toList.mkString(" ")).mkString(", ")
    +
    +  def generate(): String = {
    +    s"""
    +       |public class $generatedClassName {
    +       |${initializeAggregateHashMap()}
    +       |
    +       |${generateFindOrInsert()}
    +       |
    +       |${generateEquals()}
    +       |
    +       |${generateHashFunction()}
    +       |}
    +     """.stripMargin
    +  }
    +
    +  def initializeAggregateHashMap(): String = {
    +    val generatedSchema: String =
    --- End diff --
    
    Yea I agree it is weird to generate the schema on the fly. We should just pass the value in.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207537477
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206706059
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206949611
  
    The generated code takes a schema in the ctor and creates one as a member var. Let's just use the member var one like you had originally. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206021282
  
    ```java
    /* 001 */ public Object generate(Object[] references) {
    /* 002 */   return new GeneratedIterator(references);
    /* 003 */ }
    /* 004 */ 
    /* 005 */ /** Codegened pipeline for:
    /* 006 */ * TungstenAggregate(key=[k1#29L,k2#30L], functions=[(sum(id#26L),mode=Final,isDistinct=false)], output=[k1#29L,k2#30L,sum(id)#34L]...
    /* 007 */   */
    /* 008 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
    /* 009 */   private Object[] references;
    /* 010 */   private boolean agg_initAgg;
    /* 011 */   private agg_AggregateHashMap agg_aggregateHashMap;
    /* 012 */   private org.apache.spark.sql.execution.aggregate.TungstenAggregate agg_plan;
    /* 013 */   private org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap agg_hashMap;
    /* 014 */   private org.apache.spark.sql.execution.UnsafeKVExternalSorter agg_sorter;
    /* 015 */   private org.apache.spark.unsafe.KVIterator agg_mapIter;
    /* 016 */   private scala.collection.Iterator inputadapter_input;
    /* 017 */   private UnsafeRow agg_result;
    /* 018 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder agg_holder;
    /* 019 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter agg_rowWriter;
    /* 020 */   private UnsafeRow agg_result1;
    /* 021 */   private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder agg_holder1;
    /* 022 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter agg_rowWriter1;
    /* 023 */   private org.apache.spark.sql.execution.metric.LongSQLMetric wholestagecodegen_numOutputRows;
    /* 024 */   private org.apache.spark.sql.execution.metric.LongSQLMetricValue wholestagecodegen_metricValue;
    /* 025 */   
    /* 026 */   public GeneratedIterator(Object[] references) {
    /* 027 */     this.references = references;
    /* 028 */   }
    /* 029 */   
    /* 030 */   public void init(int index, scala.collection.Iterator inputs[]) {
    /* 031 */     partitionIndex = index;
    /* 032 */     agg_initAgg = false;
    /* 033 */     agg_aggregateHashMap = new agg_AggregateHashMap();
    /* 034 */     this.agg_plan = (org.apache.spark.sql.execution.aggregate.TungstenAggregate) references[0];
    /* 035 */     agg_hashMap = agg_plan.createHashMap();
    /* 036 */     
    /* 037 */     inputadapter_input = inputs[0];
    /* 038 */     agg_result = new UnsafeRow(2);
    /* 039 */     this.agg_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_result, 0);
    /* 040 */     this.agg_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_holder, 2);
    /* 041 */     agg_result1 = new UnsafeRow(3);
    /* 042 */     this.agg_holder1 = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(agg_result1, 0);
    /* 043 */     this.agg_rowWriter1 = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(agg_holder1, 3);
    /* 044 */     this.wholestagecodegen_numOutputRows = (org.apache.spark.sql.execution.metric.LongSQLMetric) references[1];
    /* 045 */     wholestagecodegen_metricValue = (org.apache.spark.sql.execution.metric.LongSQLMetricValue) wholestagecodegen_numOutputRows.localValue();
    /* 046 */   }
    /* 047 */   
    /* 048 */   public class agg_AggregateHashMap {
    /* 049 */     private org.apache.spark.sql.execution.vectorized.ColumnarBatch batch;
    /* 050 */     private int[] buckets;
    /* 051 */     private int numBuckets;
    /* 052 */     private int maxSteps;
    /* 053 */     private int numRows = 0;
    /* 054 */     private org.apache.spark.sql.types.StructType schema =
    /* 055 */     new org.apache.spark.sql.types.StructType()
    /* 056 */     .add("k1", org.apache.spark.sql.types.DataTypes.LongType)
    /* 057 */     .add("k2", org.apache.spark.sql.types.DataTypes.LongType)
    /* 058 */     .add("sum", org.apache.spark.sql.types.DataTypes.LongType);
    /* 059 */     
    /* 060 */     public agg_AggregateHashMap(int capacity, double loadFactor, int maxSteps) {
    /* 061 */       assert (capacity > 0 && ((capacity & (capacity - 1)) == 0));
    /* 062 */       this.maxSteps = maxSteps;
    /* 063 */       numBuckets = (int) (capacity / loadFactor);
    /* 064 */       batch = org.apache.spark.sql.execution.vectorized.ColumnarBatch.allocate(schema,
    /* 065 */         org.apache.spark.memory.MemoryMode.ON_HEAP, capacity);
    /* 066 */       buckets = new int[numBuckets];
    /* 067 */       java.util.Arrays.fill(buckets, -1);
    /* 068 */     }
    /* 069 */     
    /* 070 */     public agg_AggregateHashMap() {
    /* 071 */       new agg_AggregateHashMap(1 << 16, 0.25, 5);
    /* 072 */     }
    /* 073 */     
    /* 074 */     public org.apache.spark.sql.execution.vectorized.ColumnarBatch.Row findOrInsert(long k1, long k2) {
    /* 075 */       int idx = find(k1, k2);
    /* 076 */       if (idx != -1 && buckets[idx] == -1) {
    /* 077 */         batch.column(0).putLong(numRows, k1);
    /* 078 */         batch.column(1).putLong(numRows, k2);
    /* 079 */         batch.column(2).putLong(numRows, 0);
    /* 080 */         buckets[idx] = numRows++;
    /* 081 */       }
    /* 082 */       return batch.getRow(buckets[idx]);
    /* 083 */     }
    /* 084 */     
    /* 085 */     private int find(long k1, long k2) {
    /* 086 */       long h = hash(k1, k2);
    /* 087 */       int step = 0;
    /* 088 */       int idx = (int) h & (numBuckets - 1);
    /* 089 */       while (step < maxSteps) {
    /* 090 */         // Return bucket index if it's either an empty slot or already contains the key
    /* 091 */         if (buckets[idx] == -1) {
    /* 092 */           return idx;
    /* 093 */         } else if (equals(idx, k1, k2)) {
    /* 094 */           return idx;
    /* 095 */         }
    /* 096 */         idx = (idx + 1) & (numBuckets - 1);
    /* 097 */         step++;
    /* 098 */       }
    /* 099 */       // Didn't find it
    /* 100 */       return -1;
    /* 101 */     }
    /* 102 */     
    /* 103 */     private boolean equals(int idx, long k1, long k2) {
    /* 104 */       return batch.column(0).getLong(buckets[idx]) == k1 && batch.column(1).getLong(buckets[idx]) == k2;
    /* 105 */     }
    /* 106 */     
    /* 107 */     // TODO: Improve this Hash Function
    /* 108 */     private long hash(long k1, long k2) {
    /* 109 */       return k1 & k2;
    /* 110 */     }
    /* 111 */     
    /* 112 */   }
    /* 113 */   
    /* 114 */   private void agg_doAggregateWithKeys() throws java.io.IOException {
    /* 115 */     /*** PRODUCE: INPUT */
    /* 116 */     
    /* 117 */     while (inputadapter_input.hasNext()) {
    /* 118 */       InternalRow inputadapter_row = (InternalRow) inputadapter_input.next();
    /* 119 */       /*** CONSUME: TungstenAggregate(key=[k1#29L,k2#30L], functions=[(sum(id#26L),mode=Final,isDistinct=false)], output=[k1#29L,k2#30L,sum(id)#34L]... */
    /* 120 */       /* input[0, bigint] */
    /* 121 */       long inputadapter_value = inputadapter_row.getLong(0);
    /* 122 */       /* input[1, bigint] */
    /* 123 */       long inputadapter_value1 = inputadapter_row.getLong(1);
    /* 124 */       /* input[2, bigint] */
    /* 125 */       boolean inputadapter_isNull2 = inputadapter_row.isNullAt(2);
    /* 126 */       long inputadapter_value2 = inputadapter_isNull2 ? -1L : (inputadapter_row.getLong(2));
    /* 127 */       
    /* 128 */       // generate grouping key
    /* 129 */       agg_rowWriter.write(0, inputadapter_value);
    /* 130 */       
    /* 131 */       agg_rowWriter.write(1, inputadapter_value1);
    /* 132 */       /* hash(input[0, bigint], input[1, bigint], 42) */
    /* 133 */       int agg_value2 = 42;
    /* 134 */       
    /* 135 */       agg_value2 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashLong(inputadapter_value, agg_value2);
    /* 136 */       
    /* 137 */       agg_value2 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashLong(inputadapter_value1, agg_value2);
    /* 138 */       UnsafeRow agg_aggBuffer = null;
    /* 139 */       if (true) {
    /* 140 */         // try to get the buffer from hash map
    /* 141 */         agg_aggBuffer = agg_hashMap.getAggregationBufferFromUnsafeRow(agg_result, agg_value2);
    /* 142 */       }
    /* 143 */       if (agg_aggBuffer == null) {
    /* 144 */         if (agg_sorter == null) {
    /* 145 */           agg_sorter = agg_hashMap.destructAndCreateExternalSorter();
    /* 146 */         } else {
    /* 147 */           agg_sorter.merge(agg_hashMap.destructAndCreateExternalSorter());
    /* 148 */         }
    /* 149 */         
    /* 150 */         // the hash map had be spilled, it should have enough memory now,
    /* 151 */         // try  to allocate buffer again.
    /* 152 */         agg_aggBuffer = agg_hashMap.getAggregationBufferFromUnsafeRow(agg_result, agg_value2);
    /* 153 */         if (agg_aggBuffer == null) {
    /* 154 */           // failed to allocate the first page
    /* 155 */           throw new OutOfMemoryError("No enough memory for aggregation");
    /* 156 */         }
    /* 157 */       }
    /* 158 */       
    /* 159 */       // evaluate aggregate function
    /* 160 */       /* coalesce((coalesce(input[0, bigint], cast(0 as bigint)) + input[3, bigint]), input[0, bigint]) */
    /* 161 */       /* (coalesce(input[0, bigint], cast(0 as bigint)) + input[3, bigint]) */
    /* 162 */       boolean agg_isNull6 = true;
    /* 163 */       long agg_value6 = -1L;
    /* 164 */       /* coalesce(input[0, bigint], cast(0 as bigint)) */
    /* 165 */       /* input[0, bigint] */
    /* 166 */       boolean agg_isNull8 = agg_aggBuffer.isNullAt(0);
    /* 167 */       long agg_value8 = agg_isNull8 ? -1L : (agg_aggBuffer.getLong(0));
    /* 168 */       boolean agg_isNull7 = agg_isNull8;
    /* 169 */       long agg_value7 = agg_value8;
    /* 170 */       
    /* 171 */       if (agg_isNull7) {
    /* 172 */         /* cast(0 as bigint) */
    /* 173 */         boolean agg_isNull9 = false;
    /* 174 */         long agg_value9 = -1L;
    /* 175 */         if (!false) {
    /* 176 */           agg_value9 = (long) 0;
    /* 177 */         }
    /* 178 */         if (!agg_isNull9) {
    /* 179 */           agg_isNull7 = false;
    /* 180 */           agg_value7 = agg_value9;
    /* 181 */         }
    /* 182 */       }
    /* 183 */       
    /* 184 */       if (!inputadapter_isNull2) {
    /* 185 */         agg_isNull6 = false; // resultCode could change nullability.
    /* 186 */         agg_value6 = agg_value7 + inputadapter_value2;
    /* 187 */         
    /* 188 */       }
    /* 189 */       boolean agg_isNull5 = agg_isNull6;
    /* 190 */       long agg_value5 = agg_value6;
    /* 191 */       
    /* 192 */       if (agg_isNull5) {
    /* 193 */         /* input[0, bigint] */
    /* 194 */         boolean agg_isNull12 = agg_aggBuffer.isNullAt(0);
    /* 195 */         long agg_value12 = agg_isNull12 ? -1L : (agg_aggBuffer.getLong(0));
    /* 196 */         if (!agg_isNull12) {
    /* 197 */           agg_isNull5 = false;
    /* 198 */           agg_value5 = agg_value12;
    /* 199 */         }
    /* 200 */       }
    /* 201 */       // update aggregate buffer
    /* 202 */       if (!agg_isNull5) {
    /* 203 */         agg_aggBuffer.setLong(0, agg_value5);
    /* 204 */       } else {
    /* 205 */         agg_aggBuffer.setNullAt(0);
    /* 206 */       }
    /* 207 */       if (shouldStop()) return;
    /* 208 */     }
    /* 209 */     
    /* 210 */     agg_mapIter = agg_plan.finishAggregate(agg_hashMap, agg_sorter);
    /* 211 */   }
    /* 212 */   
    /* 213 */   protected void processNext() throws java.io.IOException {
    /* 214 */     /*** PRODUCE: TungstenAggregate(key=[k1#29L,k2#30L], functions=[(sum(id#26L),mode=Final,isDistinct=false)], output=[k1#29L,k2#30L,sum(id)#34L]... */
    /* 215 */     
    /* 216 */     if (!agg_initAgg) {
    /* 217 */       agg_initAgg = true;
    /* 218 */       agg_doAggregateWithKeys();
    /* 219 */     }
    /* 220 */     
    /* 221 */     // output the result
    /* 222 */     while (agg_mapIter.next()) {
    /* 223 */       wholestagecodegen_metricValue.add(1);
    /* 224 */       UnsafeRow agg_aggKey = (UnsafeRow) agg_mapIter.getKey();
    /* 225 */       UnsafeRow agg_aggBuffer1 = (UnsafeRow) agg_mapIter.getValue();
    /* 226 */       
    /* 227 */       /* input[0, bigint] */
    /* 228 */       long agg_value13 = agg_aggKey.getLong(0);
    /* 229 */       /* input[1, bigint] */
    /* 230 */       long agg_value14 = agg_aggKey.getLong(1);
    /* 231 */       /* input[0, bigint] */
    /* 232 */       boolean agg_isNull15 = agg_aggBuffer1.isNullAt(0);
    /* 233 */       long agg_value15 = agg_isNull15 ? -1L : (agg_aggBuffer1.getLong(0));
    /* 234 */       
    /* 235 */       /*** CONSUME: WholeStageCodegen */
    /* 236 */       
    /* 237 */       agg_rowWriter1.zeroOutNullBytes();
    /* 238 */       
    /* 239 */       agg_rowWriter1.write(0, agg_value13);
    /* 240 */       
    /* 241 */       agg_rowWriter1.write(1, agg_value14);
    /* 242 */       
    /* 243 */       if (agg_isNull15) {
    /* 244 */         agg_rowWriter1.setNullAt(2);
    /* 245 */       } else {
    /* 246 */         agg_rowWriter1.write(2, agg_value15);
    /* 247 */       }
    /* 248 */       append(agg_result1);
    /* 249 */       
    /* 250 */       if (shouldStop()) return;
    /* 251 */     }
    /* 252 */     
    /* 253 */     agg_mapIter.close();
    /* 254 */     if (agg_sorter == null) {
    /* 255 */       agg_hashMap.free();
    /* 256 */     }
    /* 257 */   }
    /* 258 */ }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206040363
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55042/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12161#discussion_r58978611
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregateHashMap.scala ---
    @@ -0,0 +1,125 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.aggregate
    +
    +import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
    +import org.apache.spark.sql.types.StructType
    +
    +class TungstenAggregateHashMap(
    --- End diff --
    
    and this should be an object with a single public function, generate, which takes the constructor arguments?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207578866
  
    **[Test build #55380 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55380/consoleFull)** for PR 12161 at commit [`ec74328`](https://github.com/apache/spark/commit/ec74328ab73766481d3aa7e566fe592bbde747eb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206706611
  
    **[Test build #55180 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55180/consoleFull)** for PR 12161 at commit [`ff6ebbe`](https://github.com/apache/spark/commit/ff6ebbe598e09cfe16667669e2a119e02612bd26).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207274783
  
    Seems like amp-jenkins-worker-06 is in a bad state. test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206040360
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206705916
  
    **[Test build #55179 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55179/consoleFull)** for PR 12161 at commit [`cae66fd`](https://github.com/apache/spark/commit/cae66fd299cb770b2bef75d258947d7f1ecfe36e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206037766
  
    cc @nongli 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207536028
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12161#discussion_r58978578
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregateHashMap.scala ---
    @@ -0,0 +1,125 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.aggregate
    +
    +import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
    +import org.apache.spark.sql.types.StructType
    +
    +class TungstenAggregateHashMap(
    --- End diff --
    
    also maybe this should be called ColumnarAggMapCodeGenerator?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by nongli <gi...@git.apache.org>.
Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12161#discussion_r58788879
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregateHashMap.scala ---
    @@ -0,0 +1,132 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.aggregate
    +
    +import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
    +import org.apache.spark.sql.types.StructType
    +
    +class TungstenAggregateHashMap(
    +    ctx: CodegenContext,
    +    generatedClassName: String,
    +    groupingKeySchema: StructType,
    +    bufferSchema: StructType) {
    +  val groupingKeys = groupingKeySchema.map(key => (key.dataType.typeName, ctx.freshName("key")))
    +  val bufferValues = bufferSchema.map(key => (ctx.freshName("value"), key.dataType.typeName))
    +  val groupingKeySignature = groupingKeys.map(_.productIterator.toList.mkString(" ")).mkString(", ")
    +
    +  def generate(): String = {
    +    s"""
    +       |public class $generatedClassName {
    +       |${initializeAggregateHashMap()}
    +       |
    +       |${generateFindOrInsert()}
    +       |
    +       |${generateEquals()}
    +       |
    +       |${generateHashFunction()}
    +       |}
    +     """.stripMargin
    +  }
    +
    +  def initializeAggregateHashMap(): String = {
    +    val generatedSchema: String =
    --- End diff --
    
    I don't htink this should be generated. I think the generated ctor should take a schema and we should get that from the non-generated code if possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207136541
  
    sure, added it back


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206080177
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206024196
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207160037
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55272/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-207267717
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55327/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12161#discussion_r58978741
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregateHashMap.scala ---
    @@ -0,0 +1,125 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.aggregate
    +
    +import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
    +import org.apache.spark.sql.types.StructType
    +
    +class TungstenAggregateHashMap(
    +    ctx: CodegenContext,
    +    generatedClassName: String,
    +    groupingKeySchema: StructType,
    +    bufferSchema: StructType) {
    +  val groupingKeys = groupingKeySchema.map(k => (k.dataType.typeName, ctx.freshName("key")))
    +  val bufferValues = bufferSchema.map(k => (k.dataType.typeName, ctx.freshName("value")))
    +  val groupingKeySignature = groupingKeys.map(_.productIterator.toList.mkString(" ")).mkString(", ")
    +
    +  def generate(): String = {
    +    s"""
    +       |public class $generatedClassName {
    +       |${initializeAggregateHashMap()}
    +       |
    +       |${generateFindOrInsert()}
    +       |
    +       |${generateEquals()}
    +       |
    +       |${generateHashFunction()}
    +       |}
    +     """.stripMargin
    +  }
    +
    +  def initializeAggregateHashMap(): String = {
    +    val generatedSchema: String =
    +      s"""
    +         |new org.apache.spark.sql.types.StructType()
    +         |${(groupingKeySchema ++ bufferSchema).map(key =>
    +          s""".add("${key.name}", org.apache.spark.sql.types.DataTypes.${key.dataType})""")
    +          .mkString("\n")};
    +      """.stripMargin
    +
    +    s"""
    +       |  private org.apache.spark.sql.execution.vectorized.ColumnarBatch batch;
    +       |  private int[] buckets;
    +       |  private int numBuckets;
    +       |  private int maxSteps;
    +       |  private int numRows = 0;
    +       |  private org.apache.spark.sql.types.StructType schema = $generatedSchema
    +       |
    +       |  public $generatedClassName(int capacity, double loadFactor, int maxSteps) {
    +       |    assert (capacity > 0 && ((capacity & (capacity - 1)) == 0));
    +       |    this.maxSteps = maxSteps;
    +       |    numBuckets = (int) (capacity / loadFactor);
    +       |    batch = org.apache.spark.sql.execution.vectorized.ColumnarBatch.allocate(schema,
    +       |      org.apache.spark.memory.MemoryMode.ON_HEAP, capacity);
    +       |    buckets = new int[numBuckets];
    +       |    java.util.Arrays.fill(buckets, -1);
    +       |  }
    +       |
    +       |  public $generatedClassName() {
    +       |    new $generatedClassName(1 << 16, 0.25, 5);
    +       |  }
    +     """.stripMargin
    +  }
    +
    +  def generateHashFunction(): String = {
    +    s"""
    +       |// TODO: Improve this Hash Function
    +       |private long hash($groupingKeySignature) {
    +       |  return ${groupingKeys.map(_._2).mkString(" ^ ")};
    +       |}
    +     """.stripMargin
    +  }
    +
    +  def generateEquals(): String = {
    +    s"""
    +       |private boolean equals(int idx, $groupingKeySignature) {
    +       |  return ${groupingKeys.zipWithIndex.map(k =>
    +            s"batch.column(${k._2}).getLong(buckets[idx]) == ${k._1._2}").mkString(" && ")};
    +       |}
    +     """.stripMargin
    +  }
    +
    +  def generateFindOrInsert(): String = {
    +    s"""
    +       |public org.apache.spark.sql.execution.vectorized.ColumnarBatch.Row findOrInsert(${
    +          groupingKeySignature}) {
    +       |  long h = hash(${groupingKeys.map(_._2).mkString(", ")});
    +       |  int step = 0;
    +       |  int idx = (int) h & (numBuckets - 1);
    +       |  while (step < maxSteps) {
    +       |    // Return bucket index if it's either an empty slot or already contains the key
    +       |    if (buckets[idx] == -1) {
    +       |      ${groupingKeys.zipWithIndex.map(k =>
    +                s"batch.column(${k._2}).putLong(numRows, ${k._1._2});").mkString("\n")}
    +       |      ${bufferValues.zipWithIndex.map(k =>
    +                s"batch.column(${groupingKeys.length + k._2}).putLong(numRows, 0);")
    +                .mkString("\n")}
    +       |      buckets[idx] = numRows++;
    +       |      return batch.getRow(buckets[idx]);
    +       |    } else if (equals(idx, ${groupingKeys.map(_._2).mkString(", ")})) {
    +       |      return batch.getRow(buckets[idx]);
    +       |    }
    +       |    idx = (idx + 1) & (numBuckets - 1);
    +       |    step++;
    +       |  }
    +       |// Didn't find it
    +       |return null;
    --- End diff --
    
    indent is off here?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-14394][SQL] Generate AggregateHash...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-205573003
  
    **[Test build #54918 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54918/consoleFull)** for PR 12161 at commit [`8a47e1e`](https://github.com/apache/spark/commit/8a47e1ea9886a3389d7254da4391fa2446e74b40).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206024198
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55034/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206691544
  
    **[Test build #55180 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55180/consoleFull)** for PR 12161 at commit [`ff6ebbe`](https://github.com/apache/spark/commit/ff6ebbe598e09cfe16667669e2a119e02612bd26).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206220304
  
    **[Test build #55104 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55104/consoleFull)** for PR 12161 at commit [`13b6b44`](https://github.com/apache/spark/commit/13b6b448fbfdef42688f9c46ba5c589fd4133b28).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12161#discussion_r59325685
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ColumnarAggMapCodeGenerator.scala ---
    @@ -0,0 +1,193 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.aggregate
    +
    +import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
    +import org.apache.spark.sql.types.StructType
    +
    +/**
    + * This is a helper object to generate an append-only single-key/single value aggregate hash
    + * map that can act as a 'cache' for extremely fast key-value lookups while evaluating aggregates
    + * (and fall back to the `BytesToBytesMap` if a given key isn't found). This is 'codegened' in
    + * TungstenAggregate to speed up aggregates w/ key.
    + *
    + * It is backed by a power-of-2-sized array for index lookups and a columnar batch that stores the
    + * key-value pairs. The index lookups in the array rely on linear probing (with a small number of
    + * maximum tries) and use an inexpensive hash function which makes it really efficient for a
    + * majority of lookups. However, using linear probing and an inexpensive hash function also makes it
    + * less robust as compared to the `BytesToBytesMap` (especially for a large number of keys or even
    + * for certain distribution of keys) and requires us to fall back on the latter for correctness.
    + */
    +class ColumnarAggMapCodeGenerator(
    --- End diff --
    
    everything in execution is private


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206706060
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55179/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206079654
  
    **[Test build #55068 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55068/consoleFull)** for PR 12161 at commit [`bd96657`](https://github.com/apache/spark/commit/bd96657854cc643547ddabd8efee6f645ee7a7ff).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class TungstenAggregateHashMap(`
      * `       |public class $generatedClassName `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12161#issuecomment-206274280
  
    **[Test build #55104 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55104/consoleFull)** for PR 12161 at commit [`13b6b44`](https://github.com/apache/spark/commit/13b6b448fbfdef42688f9c46ba5c589fd4133b28).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14394][SQL] Generate AggregateHashMap c...

Posted by tedyu <gi...@git.apache.org>.
Github user tedyu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12161#discussion_r59094076
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ColumnarAggMapCodeGenerator.scala ---
    @@ -0,0 +1,193 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.aggregate
    +
    +import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
    +import org.apache.spark.sql.types.StructType
    +
    +/**
    + * This is a helper object to generate an append-only single-key/single value aggregate hash
    + * map that can act as a 'cache' for extremely fast key-value lookups while evaluating aggregates
    + * (and fall back to the `BytesToBytesMap` if a given key isn't found). This is 'codegened' in
    + * TungstenAggregate to speed up aggregates w/ key.
    + *
    + * It is backed by a power-of-2-sized array for index lookups and a columnar batch that stores the
    + * key-value pairs. The index lookups in the array rely on linear probing (with a small number of
    + * maximum tries) and use an inexpensive hash function which makes it really efficient for a
    + * majority of lookups. However, using linear probing and an inexpensive hash function also makes it
    + * less robust as compared to the `BytesToBytesMap` (especially for a large number of keys or even
    + * for certain distribution of keys) and requires us to fall back on the latter for correctness.
    + */
    +class ColumnarAggMapCodeGenerator(
    --- End diff --
    
    This class can be private, right ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org