You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2016/02/12 01:49:54 UTC

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/11177

    [SPARK-13293] [SQL] generate Expand

    Expand suffer from create the UnsafeRow from same input multiple times, with codegen, it only need to copy some of the columns.
    
    After this, we can see 3X improvements (from 43 seconds to 13 seconds) on a TPCDS query (Q67) that have eight columns in Rollup.
    
    Ideally, we could mask some of the columns based on bitmask, I'd leave that in the future, because currently Aggregation (50 ns) is much slower than that just copy the variables (1-2 ns).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark gen_expand

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11177.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11177
    
----
commit 22ceda9a82c050abbe0d885513a713e9c2dceb29
Author: Davies Liu <da...@databricks.com>
Date:   2016-02-11T23:32:21Z

    generate Expand

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11177#issuecomment-183152562
  
    **[Test build #51156 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51156/consoleFull)** for PR 11177 at commit [`22ceda9`](https://github.com/apache/spark/commit/22ceda9a82c050abbe0d885513a713e9c2dceb29).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11177#discussion_r52806016
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Expand.scala ---
    @@ -71,9 +80,86 @@ case class Expand(
                 idx = 0
               }
     
    +          numOutputRows += 1
               result
             }
           }
         }
       }
    +
    +  override def upstream(): RDD[InternalRow] = {
    +    child.asInstanceOf[CodegenSupport].upstream()
    +  }
    +
    +  protected override def doProduce(ctx: CodegenContext): String = {
    +    child.asInstanceOf[CodegenSupport].produce(ctx, this)
    +  }
    +
    +  override def doConsume(ctx: CodegenContext, input: Seq[ExprCode]): String = {
    +    // Some columns have the same expression in all the projections, so collect the unique
    +    // expressions.
    +    val columnUniqueExpressions: IndexedSeq[Set[Expression]] = output.indices.map { i =>
    --- End diff --
    
    for this one, can we explain what the indexes are, and what the expressions are?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11177#discussion_r52796788
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Expand.scala ---
    @@ -71,9 +80,76 @@ case class Expand(
                 idx = 0
               }
     
    +          numOutputRows += 1
               result
             }
           }
         }
       }
    +
    +  override def upstream(): RDD[InternalRow] = {
    +    child.asInstanceOf[CodegenSupport].upstream()
    +  }
    +
    +  protected override def doProduce(ctx: CodegenContext): String = {
    +    child.asInstanceOf[CodegenSupport].produce(ctx, this)
    +  }
    +
    +  override def doConsume(ctx: CodegenContext, input: Seq[ExprCode]): String = {
    +    val uniqExprs: IndexedSeq[Set[Expression]] = output.indices.map { i =>
    +      projections.map(p => p(i)).toSet
    +    }
    +
    +    ctx.currentVars = input
    +    val resultVars = uniqExprs.zipWithIndex.map { case (exprs, i) =>
    +      val expr = exprs.head
    +      if (exprs.size == 1) {
    +        // it's common to have same expression for some columns in all the projections, for example,
    +        // GroupingSet will copy all the output from child as the first part of output.
    +        // We should only generate the columns once.
    +        BindReferences.bindReference(expr, child.output).gen(ctx)
    +      } else {
    +        val isNull = ctx.freshName("isNull")
    +        val value = ctx.freshName("value")
    +        val code =
    +          s"""
    +             |boolean $isNull = true;
    +             |${ctx.javaType(expr.dataType)} $value = ${ctx.defaultValue(expr.dataType)};
    +         """.stripMargin
    +        ExprCode(code, isNull, value)
    +      }
    +    }
    +
    +    // In order to prevent code exploration, we can't call `consume()` many times, so we call
    --- End diff --
    
    btw any perf degradation from not unrolling the loop?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11177#issuecomment-183551777
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11177#issuecomment-183137573
  
    **[Test build #51156 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51156/consoleFull)** for PR 11177 at commit [`22ceda9`](https://github.com/apache/spark/commit/22ceda9a82c050abbe0d885513a713e9c2dceb29).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11177#issuecomment-183492718
  
    @rxin Had posted the generated code, add more comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11177#issuecomment-183532637
  
    **[Test build #51219 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51219/consoleFull)** for PR 11177 at commit [`ff2b5a4`](https://github.com/apache/spark/commit/ff2b5a441bd6c51b0eebffba1161189a60b88638).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11177#issuecomment-183152706
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51156/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11177#discussion_r52699940
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Expand.scala ---
    @@ -71,9 +80,76 @@ case class Expand(
                 idx = 0
               }
     
    +          numOutputRows += 1
               result
             }
           }
         }
       }
    +
    +  override def upstream(): RDD[InternalRow] = {
    +    child.asInstanceOf[CodegenSupport].upstream()
    +  }
    +
    +  protected override def doProduce(ctx: CodegenContext): String = {
    +    child.asInstanceOf[CodegenSupport].produce(ctx, this)
    +  }
    +
    +  override def doConsume(ctx: CodegenContext, input: Seq[ExprCode]): String = {
    +    val uniqExprs: IndexedSeq[Set[Expression]] = output.indices.map { i =>
    +      projections.map(p => p(i)).toSet
    +    }
    +
    +    ctx.currentVars = input
    +    val resultVars = uniqExprs.zipWithIndex.map { case (exprs, i) =>
    +      val expr = exprs.head
    +      if (exprs.size == 1) {
    +        // it's common to have same expression for some columns in all the projections, for example,
    +        // GroupingSet will copy all the output from child as the first part of output.
    +        // We should only generate the columns once.
    +        BindReferences.bindReference(expr, child.output).gen(ctx)
    +      } else {
    +        val isNull = ctx.freshName("isNull")
    +        val value = ctx.freshName("value")
    +        val code =
    +          s"""
    +             |boolean $isNull = true;
    +             |${ctx.javaType(expr.dataType)} $value = ${ctx.defaultValue(expr.dataType)};
    +         """.stripMargin
    +        ExprCode(code, isNull, value)
    +      }
    +    }
    +
    +    // In order to prevent code exploration, we can't call `consume()` many times, so we call
    --- End diff --
    
    what do you mean by "code exploration"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11177#issuecomment-183551314
  
    **[Test build #51219 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51219/consoleFull)** for PR 11177 at commit [`ff2b5a4`](https://github.com/apache/spark/commit/ff2b5a441bd6c51b0eebffba1161189a60b88638).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11177#issuecomment-183520009
  
    **[Test build #51205 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51205/consoleFull)** for PR 11177 at commit [`e1fd87d`](https://github.com/apache/spark/commit/e1fd87d45c37caa86d0d6bc287b73928595c755c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11177#discussion_r52819793
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Expand.scala ---
    @@ -17,11 +17,15 @@
     
     package org.apache.spark.sql.execution
     
    +import scala.collection.immutable.IndexedSeq
    --- End diff --
    
    this is no longer necessary. can you remove it in some other pr you have?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11177


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11177#discussion_r52700130
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Expand.scala ---
    @@ -71,9 +80,76 @@ case class Expand(
                 idx = 0
               }
     
    +          numOutputRows += 1
               result
             }
           }
         }
       }
    +
    +  override def upstream(): RDD[InternalRow] = {
    +    child.asInstanceOf[CodegenSupport].upstream()
    +  }
    +
    +  protected override def doProduce(ctx: CodegenContext): String = {
    +    child.asInstanceOf[CodegenSupport].produce(ctx, this)
    +  }
    +
    +  override def doConsume(ctx: CodegenContext, input: Seq[ExprCode]): String = {
    +    val uniqExprs: IndexedSeq[Set[Expression]] = output.indices.map { i =>
    +      projections.map(p => p(i)).toSet
    +    }
    +
    +    ctx.currentVars = input
    +    val resultVars = uniqExprs.zipWithIndex.map { case (exprs, i) =>
    +      val expr = exprs.head
    +      if (exprs.size == 1) {
    +        // it's common to have same expression for some columns in all the projections, for example,
    +        // GroupingSet will copy all the output from child as the first part of output.
    +        // We should only generate the columns once.
    +        BindReferences.bindReference(expr, child.output).gen(ctx)
    +      } else {
    +        val isNull = ctx.freshName("isNull")
    +        val value = ctx.freshName("value")
    +        val code =
    +          s"""
    +             |boolean $isNull = true;
    +             |${ctx.javaType(expr.dataType)} $value = ${ctx.defaultValue(expr.dataType)};
    +         """.stripMargin
    +        ExprCode(code, isNull, value)
    +      }
    +    }
    +
    +    // In order to prevent code exploration, we can't call `consume()` many times, so we call
    +    // that in a loop, and use swith/case to select the projections.
    +    val projectCodes = projections.zipWithIndex.map { case (exprs, i) =>
    --- End diff --
    
    i find the body of this loop pretty hard to understand. can we add some high level comment to explain what's going on?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11177#issuecomment-183150208
  
    As always, can you paste the generated code? :)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11177#issuecomment-183551779
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51219/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11177#discussion_r52799617
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Expand.scala ---
    @@ -71,9 +80,76 @@ case class Expand(
                 idx = 0
               }
     
    +          numOutputRows += 1
               result
             }
           }
         }
       }
    +
    +  override def upstream(): RDD[InternalRow] = {
    +    child.asInstanceOf[CodegenSupport].upstream()
    +  }
    +
    +  protected override def doProduce(ctx: CodegenContext): String = {
    +    child.asInstanceOf[CodegenSupport].produce(ctx, this)
    +  }
    +
    +  override def doConsume(ctx: CodegenContext, input: Seq[ExprCode]): String = {
    +    val uniqExprs: IndexedSeq[Set[Expression]] = output.indices.map { i =>
    +      projections.map(p => p(i)).toSet
    +    }
    +
    +    ctx.currentVars = input
    +    val resultVars = uniqExprs.zipWithIndex.map { case (exprs, i) =>
    +      val expr = exprs.head
    +      if (exprs.size == 1) {
    +        // it's common to have same expression for some columns in all the projections, for example,
    +        // GroupingSet will copy all the output from child as the first part of output.
    +        // We should only generate the columns once.
    +        BindReferences.bindReference(expr, child.output).gen(ctx)
    +      } else {
    +        val isNull = ctx.freshName("isNull")
    +        val value = ctx.freshName("value")
    +        val code =
    +          s"""
    +             |boolean $isNull = true;
    +             |${ctx.javaType(expr.dataType)} $value = ${ctx.defaultValue(expr.dataType)};
    +         """.stripMargin
    +        ExprCode(code, isNull, value)
    +      }
    +    }
    +
    +    // In order to prevent code exploration, we can't call `consume()` many times, so we call
    --- End diff --
    
    The loop and copy two variables should only take about 1-2 nano second, should not have regressions.
    
    But if we don't have loop here, then the generated code could be much easier to be larger than 8K, that could be regression (slower than without codegen).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11177#issuecomment-183152703
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11177#issuecomment-183497964
  
    **[Test build #51205 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51205/consoleFull)** for PR 11177 at commit [`e1fd87d`](https://github.com/apache/spark/commit/e1fd87d45c37caa86d0d6bc287b73928595c755c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11177#issuecomment-183554490
  
    LGTM. Merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11177#issuecomment-183520524
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11177#issuecomment-183491739
  
    The part of generated code for
    ```
    sqlContext.range(N).selectExpr("id", "id % 1000 as k1", "id & 256 as k2")
            .cube("k1", "k2").sum("id").collect()
    ```
    ```
    /* 075 */       /* (input[0, bigint] % 1000) */
    /* 076 */       boolean project_isNull1 = false;
    /* 077 */       long project_value1 = -1L;
    /* 078 */       if (false || 1000L == 0) {
    /* 079 */         project_isNull1 = true;
    /* 080 */       } else {
    /* 081 */         if (false) {
    /* 082 */           project_isNull1 = true;
    /* 083 */         } else {
    /* 084 */           project_value1 = (long)(range_value % 1000L);
    /* 085 */         }
    /* 086 */       }
    /* 087 */       /* (input[0, bigint] & 256) */
    /* 088 */       long project_value4 = -1L;
    /* 089 */       project_value4 = range_value & 256L;
    /* 090 */       /* (input[0, bigint] % 1000) */
    /* 091 */       boolean project_isNull7 = false;
    /* 092 */       long project_value7 = -1L;
    /* 093 */       if (false || 1000L == 0) {
    /* 094 */         project_isNull7 = true;
    /* 095 */       } else {
    /* 096 */         if (false) {
    /* 097 */           project_isNull7 = true;
    /* 098 */         } else {
    /* 099 */           project_value7 = (long)(range_value % 1000L);
    /* 100 */         }
    /* 101 */       }
    /* 102 */       /* (input[0, bigint] & 256) */
    /* 103 */       long project_value10 = -1L;
    /* 104 */       project_value10 = range_value & 256L;
    /* 105 */
    /* 106 */       boolean expand_isNull3 = true;
    /* 107 */       long expand_value3 = -1L;
    /* 108 */
    /* 109 */       boolean expand_isNull4 = true;
    /* 110 */       long expand_value4 = -1L;
    /* 111 */
    /* 112 */       boolean expand_isNull5 = true;
    /* 113 */       int expand_value5 = -1;
    /* 114 */       for (int expand_i = 0; expand_i < 4; expand_i ++) {
    /* 115 */         switch (expand_i) {
    /* 116 */         case 0:
    /* 117 */           expand_isNull3 = project_isNull7;
    /* 118 */           expand_value3 = project_value7;
    /* 119 */
    /* 120 */           expand_isNull4 = false;
    /* 121 */           expand_value4 = project_value10;
    /* 122 */
    /* 123 */           expand_isNull5 = false;
    /* 124 */           expand_value5 = 0;
    /* 125 */           break;
    /* 126 */
    /* 127 */         case 1:
    /* 128 */           expand_isNull3 = project_isNull7;
    /* 129 */           expand_value3 = project_value7;
    /* 130 */
    /* 131 */           /* null */
    /* 132 */           final long expand_value10 = -1L;
    /* 133 */           expand_isNull4 = true;
    /* 134 */           expand_value4 = expand_value10;
    /* 135 */
    /* 136 */           expand_isNull5 = false;
    /* 137 */           expand_value5 = 1;
    /* 138 */           break;
    /* 139 */
    /* 140 */         case 2:
    /* 141 */           /* null */
    /* 142 */           final long expand_value12 = -1L;
    /* 143 */           expand_isNull3 = true;
    /* 144 */           expand_value3 = expand_value12;
    /* 145 */
    /* 146 */           expand_isNull4 = false;
    /* 147 */           expand_value4 = project_value10;
    /* 148 */
    /* 149 */           expand_isNull5 = false;
    /* 150 */           expand_value5 = 2;
    /* 151 */           break;
    /* 152 */
    /* 153 */         case 3:
    /* 154 */           /* null */
    /* 155 */           final long expand_value15 = -1L;
    /* 156 */           expand_isNull3 = true;
    /* 157 */           expand_value3 = expand_value15;
    /* 158 */
    /* 159 */           /* null */
    /* 160 */           final long expand_value16 = -1L;
    /* 161 */           expand_isNull4 = true;
    /* 162 */           expand_value4 = expand_value16;
    /* 163 */
    /* 164 */           expand_isNull5 = false;
    /* 165 */           expand_value5 = 3;
    /* 166 */           break;
    /* 167 */         }
    /* 168 */         expand_metricValue.add(1);
    /* 169 */
    /* 170 */         // generate grouping key
    /* 171 */         agg_rowWriter.zeroOutNullBytes();
    /* 172 */
    /* 173 */         if (expand_isNull3) {
    /* 174 */           agg_rowWriter.setNullAt(0);
    /* 175 */         } else {
    /* 176 */           agg_rowWriter.write(0, expand_value3);
    /* 177 */         }
    /* 178 */
    /* 179 */         if (expand_isNull4) {
    /* 180 */           agg_rowWriter.setNullAt(1);
    /* 181 */         } else {
    /* 182 */           agg_rowWriter.write(1, expand_value4);
    /* 183 */         }
    /* 184 */
    /* 185 */         if (expand_isNull5) {
    /* 186 */           agg_rowWriter.setNullAt(2);
    /* 187 */         } else {
    /* 188 */           agg_rowWriter.write(2, expand_value5);
    /* 189 */         }
    /* 190 */         /* hash(input[0, bigint],input[1, bigint],input[2, int],42) */
    /* 191 */         int agg_value3 = 42;
    /* 192 */
    /* 193 */         if (!expand_isNull3) {
    /* 194 */           agg_value3 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashLong(expand_value3, agg_value3);
    /* 195 */         }
    /* 196 */
    /* 197 */         if (!expand_isNull4) {
    /* 198 */           agg_value3 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashLong(expand_value4, agg_value3);
    /* 199 */         }
    /* 200 */
    /* 201 */         agg_value3 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(expand_value5, agg_value3);
    /* 202 */         UnsafeRow agg_aggBuffer = null;
    /* 203 */         if (true) {
    /* 204 */           // try to get the buffer from hash map
    /* 205 */           agg_aggBuffer = agg_hashMap.getAggregationBufferFromUnsafeRow(agg_result, agg_value3);
    /* 206 */         }
    /* 207 */         if (agg_aggBuffer == null) {
    /* 208 */           if (agg_sorter == null) {
    /* 209 */             agg_sorter = agg_hashMap.destructAndCreateExternalSorter();
    /* 210 */           } else {
    /* 211 */             agg_sorter.merge(agg_hashMap.destructAndCreateExternalSorter());
    /* 212 */           }
    /* 213 */
    /* 214 */           // the hash map had be spilled, it should have enough memory now,
    /* 215 */           // try  to allocate buffer again.
    /* 216 */           agg_aggBuffer = agg_hashMap.getAggregationBufferFromUnsafeRow(agg_result, agg_value3);
    /* 217 */           if (agg_aggBuffer == null) {
    /* 218 */             // failed to allocate the first page
    /* 219 */             throw new OutOfMemoryError("No enough memory for aggregation");
    /* 220 */           }
    /* 221 */         }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13293] [SQL] generate Expand

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11177#issuecomment-183520529
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51205/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org