You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2016/01/29 05:52:19 UTC

[GitHub] spark pull request: [SPARK-13072][SQL] simplify and improve murmur...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/10974

    [SPARK-13072][SQL] simplify and improve murmur3 hash expression codegen

    simplify(remove several unnecessary local variables) the generated code of hash expression, and avoid null check if possible.
    
    generated code comparison for `hash(int, double, string, array<string>)`:
    **before:**
    ```
      public UnsafeRow apply(InternalRow i) {
        /* hash(input[0, int],input[1, double],input[2, string],input[3, array<int>],42) */
        int value1 = 42;
        /* input[0, int] */
        int value3 = i.getInt(0);
        if (!false) {
          value1 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(value3, value1);
        }
        /* input[1, double] */
        double value5 = i.getDouble(1);
        if (!false) {
          value1 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashLong(Double.doubleToLongBits(value5), value1);
        }
        /* input[2, string] */
        boolean isNull6 = i.isNullAt(2);
        UTF8String value7 = isNull6 ? null : (i.getUTF8String(2));
        if (!isNull6) {
          value1 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(value7.getBaseObject(), value7.getBaseOffset(), value7.numBytes(), value1);
        }
        /* input[3, array<int>] */
        boolean isNull8 = i.isNullAt(3);
        ArrayData value9 = isNull8 ? null : (i.getArray(3));
        if (!isNull8) {
          int result10 = value1;
          for (int index11 = 0; index11 < value9.numElements(); index11++) {
            if (!value9.isNullAt(index11)) {
              final int element12 = value9.getInt(index11);
              result10 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element12, result10);
            }
          }
          value1 = result10;
        }
      }
    ```
    **after:**
    ```
      public UnsafeRow apply(InternalRow i) {
        /* hash(input[0, int],input[1, double],input[2, string],input[3, array<int>],42) */
        int value1 = 42;
        /* input[0, int] */
        int value3 = i.getInt(0);
        value1 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(value3, value1);
        /* input[1, double] */
        double value5 = i.getDouble(1);
        value1 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashLong(Double.doubleToLongBits(value5), value1);
        /* input[2, string] */
        boolean isNull6 = i.isNullAt(2);
        UTF8String value7 = isNull6 ? null : (i.getUTF8String(2));
    
        if (!isNull6) {
          value1 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(value7.getBaseObject(), value7.getBaseOffset(), value7.numBytes(), value1);
        }
    
        /* input[3, array<int>] */
        boolean isNull8 = i.isNullAt(3);
        ArrayData value9 = isNull8 ? null : (i.getArray(3));
        if (!isNull8) {
          for (int index10 = 0; index10 < value9.numElements(); index10++) {
            final int element11 = value9.getInt(index10);
            value1 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element11, value1);
          }
        }
    
        rowWriter14.write(0, value1);
        return result12;
      }
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark codegen

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10974.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10974
    
----
commit f6a3cccab507c66f29f12132c9656bb409263a13
Author: Wenchen Fan <we...@databricks.com>
Date:   2016-01-29T02:13:25Z

    simplify and improve murmur3 hash expression codegen

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13072][SQL] simplify and improve murmur...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10974#issuecomment-176896153
  
    Merging this into master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13072][SQL] simplify and improve murmur...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10974


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13072][SQL] simplify and improve murmur...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10974#discussion_r51295107
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala ---
    @@ -365,91 +391,48 @@ case class Murmur3Hash(children: Seq[Expression], seed: Int) extends Expression
               hashLong(s"$input.toUnscaledLong()")
             } else {
               val bytes = ctx.freshName("bytes")
    -          val code = s"byte[] $bytes = $input.toJavaBigDecimal().unscaledValue().toByteArray();"
    -          val offset = "Platform.BYTE_ARRAY_OFFSET"
    -          val result = s"$hasher.hashUnsafeBytes($bytes, $offset, $bytes.length, $seed)"
    -          ExprCode(code, "false", result)
    +          s"""
    +            final byte[] $bytes = $input.toJavaBigDecimal().unscaledValue().toByteArray();
    +            ${hashBytes(bytes)}
    +          """
             }
           case CalendarIntervalType =>
    -        val microsecondsHash = s"$hasher.hashLong($input.microseconds, $seed)"
    -        val monthsHash = s"$hasher.hashInt($input.months, $microsecondsHash)"
    -        inlineValue(monthsHash)
    -      case BinaryType =>
    -        val offset = "Platform.BYTE_ARRAY_OFFSET"
    -        inlineValue(s"$hasher.hashUnsafeBytes($input, $offset, $input.length, $seed)")
    +        val microsecondsHash = s"$hasher.hashLong($input.microseconds, $result)"
    +        s"$result = $hasher.hashInt($input.months, $microsecondsHash);"
    +      case BinaryType => hashBytes(input)
           case StringType =>
             val baseObject = s"$input.getBaseObject()"
             val baseOffset = s"$input.getBaseOffset()"
             val numBytes = s"$input.numBytes()"
    -        inlineValue(s"$hasher.hashUnsafeBytes($baseObject, $baseOffset, $numBytes, $seed)")
    +        s"$result = $hasher.hashUnsafeBytes($baseObject, $baseOffset, $numBytes, $result);"
     
    -      case ArrayType(et, _) =>
    -        val result = ctx.freshName("result")
    +      case ArrayType(et, containsNull) =>
             val index = ctx.freshName("index")
    -        val element = ctx.freshName("element")
    -        val elementHash = computeHash(element, et, result, ctx)
    -        val code =
    -          s"""
    -            int $result = $seed;
    -            for (int $index = 0; $index < $input.numElements(); $index++) {
    -              if (!$input.isNullAt($index)) {
    -                final ${ctx.javaType(et)} $element = ${ctx.getValue(input, et, index)};
    -                ${elementHash.code}
    -                $result = ${elementHash.value};
    -              }
    -            }
    -          """
    -        ExprCode(code, "false", result)
    +        s"""
    +          for (int $index = 0; $index < $input.numElements(); $index++) {
    +            ${nullSafeElementHash(input, index, containsNull, et, result, ctx)}
    +          }
    +        """
     
    -      case MapType(kt, vt, _) =>
    -        val result = ctx.freshName("result")
    +      case MapType(kt, vt, valueContainsNull) =>
             val index = ctx.freshName("index")
             val keys = ctx.freshName("keys")
             val values = ctx.freshName("values")
    -        val key = ctx.freshName("key")
    -        val value = ctx.freshName("value")
    -        val keyHash = computeHash(key, kt, result, ctx)
    -        val valueHash = computeHash(value, vt, result, ctx)
    -        val code =
    -          s"""
    -            int $result = $seed;
    -            final ArrayData $keys = $input.keyArray();
    -            final ArrayData $values = $input.valueArray();
    -            for (int $index = 0; $index < $input.numElements(); $index++) {
    -              final ${ctx.javaType(kt)} $key = ${ctx.getValue(keys, kt, index)};
    -              ${keyHash.code}
    -              $result = ${keyHash.value};
    -              if (!$values.isNullAt($index)) {
    -                final ${ctx.javaType(vt)} $value = ${ctx.getValue(values, vt, index)};
    -                ${valueHash.code}
    -                $result = ${valueHash.value};
    -              }
    -            }
    -          """
    -        ExprCode(code, "false", result)
    +        s"""
    +          final ArrayData $keys = $input.keyArray();
    +          final ArrayData $values = $input.valueArray();
    +          for (int $index = 0; $index < $input.numElements(); $index++) {
    +            ${nullSafeElementHash(keys, index, false, kt, result, ctx)}
    +            ${nullSafeElementHash(values, index, valueContainsNull, vt, result, ctx)}
    +          }
    +        """
     
           case StructType(fields) =>
    -        val result = ctx.freshName("result")
    -        val fieldsHash = fields.map(_.dataType).zipWithIndex.map {
    -          case (dt, index) =>
    -            val field = ctx.freshName("field")
    -            val fieldHash = computeHash(field, dt, result, ctx)
    -            s"""
    -              if (!$input.isNullAt($index)) {
    -                final ${ctx.javaType(dt)} $field = ${ctx.getValue(input, dt, index.toString)};
    -                ${fieldHash.code}
    -                $result = ${fieldHash.value};
    -              }
    -            """
    +        fields.zipWithIndex.map { case (field, index) =>
    +          nullSafeElementHash(input, index.toString, field.nullable, field.dataType, result, ctx)
             }.mkString("\n")
    -        val code =
    -          s"""
    -            int $result = $seed;
    -            $fieldsHash
    -          """
    -        ExprCode(code, "false", result)
     
    -      case udt: UserDefinedType[_] => computeHash(input, udt.sqlType, seed, ctx)
    +      case udt: UserDefinedType[_] => computeHash(input, udt.sqlType, result, ctx)
    --- End diff --
    
    it's not a bug, actually I rename `seed` to `result` in this method, to make it clear that we use the previous result as seed to produce new result.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13072][SQL] simplify and improve murmur...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10974#issuecomment-176599067
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13072][SQL] simplify and improve murmur...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10974#discussion_r51294294
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala ---
    @@ -365,91 +391,48 @@ case class Murmur3Hash(children: Seq[Expression], seed: Int) extends Expression
               hashLong(s"$input.toUnscaledLong()")
             } else {
               val bytes = ctx.freshName("bytes")
    -          val code = s"byte[] $bytes = $input.toJavaBigDecimal().unscaledValue().toByteArray();"
    -          val offset = "Platform.BYTE_ARRAY_OFFSET"
    -          val result = s"$hasher.hashUnsafeBytes($bytes, $offset, $bytes.length, $seed)"
    -          ExprCode(code, "false", result)
    +          s"""
    +            final byte[] $bytes = $input.toJavaBigDecimal().unscaledValue().toByteArray();
    +            ${hashBytes(bytes)}
    +          """
             }
           case CalendarIntervalType =>
    -        val microsecondsHash = s"$hasher.hashLong($input.microseconds, $seed)"
    -        val monthsHash = s"$hasher.hashInt($input.months, $microsecondsHash)"
    -        inlineValue(monthsHash)
    -      case BinaryType =>
    -        val offset = "Platform.BYTE_ARRAY_OFFSET"
    -        inlineValue(s"$hasher.hashUnsafeBytes($input, $offset, $input.length, $seed)")
    +        val microsecondsHash = s"$hasher.hashLong($input.microseconds, $result)"
    +        s"$result = $hasher.hashInt($input.months, $microsecondsHash);"
    +      case BinaryType => hashBytes(input)
           case StringType =>
             val baseObject = s"$input.getBaseObject()"
             val baseOffset = s"$input.getBaseOffset()"
             val numBytes = s"$input.numBytes()"
    -        inlineValue(s"$hasher.hashUnsafeBytes($baseObject, $baseOffset, $numBytes, $seed)")
    +        s"$result = $hasher.hashUnsafeBytes($baseObject, $baseOffset, $numBytes, $result);"
     
    -      case ArrayType(et, _) =>
    -        val result = ctx.freshName("result")
    +      case ArrayType(et, containsNull) =>
             val index = ctx.freshName("index")
    -        val element = ctx.freshName("element")
    -        val elementHash = computeHash(element, et, result, ctx)
    -        val code =
    -          s"""
    -            int $result = $seed;
    -            for (int $index = 0; $index < $input.numElements(); $index++) {
    -              if (!$input.isNullAt($index)) {
    -                final ${ctx.javaType(et)} $element = ${ctx.getValue(input, et, index)};
    -                ${elementHash.code}
    -                $result = ${elementHash.value};
    -              }
    -            }
    -          """
    -        ExprCode(code, "false", result)
    +        s"""
    +          for (int $index = 0; $index < $input.numElements(); $index++) {
    +            ${nullSafeElementHash(input, index, containsNull, et, result, ctx)}
    +          }
    +        """
     
    -      case MapType(kt, vt, _) =>
    -        val result = ctx.freshName("result")
    +      case MapType(kt, vt, valueContainsNull) =>
             val index = ctx.freshName("index")
             val keys = ctx.freshName("keys")
             val values = ctx.freshName("values")
    -        val key = ctx.freshName("key")
    -        val value = ctx.freshName("value")
    -        val keyHash = computeHash(key, kt, result, ctx)
    -        val valueHash = computeHash(value, vt, result, ctx)
    -        val code =
    -          s"""
    -            int $result = $seed;
    -            final ArrayData $keys = $input.keyArray();
    -            final ArrayData $values = $input.valueArray();
    -            for (int $index = 0; $index < $input.numElements(); $index++) {
    -              final ${ctx.javaType(kt)} $key = ${ctx.getValue(keys, kt, index)};
    -              ${keyHash.code}
    -              $result = ${keyHash.value};
    -              if (!$values.isNullAt($index)) {
    -                final ${ctx.javaType(vt)} $value = ${ctx.getValue(values, vt, index)};
    -                ${valueHash.code}
    -                $result = ${valueHash.value};
    -              }
    -            }
    -          """
    -        ExprCode(code, "false", result)
    +        s"""
    +          final ArrayData $keys = $input.keyArray();
    +          final ArrayData $values = $input.valueArray();
    +          for (int $index = 0; $index < $input.numElements(); $index++) {
    +            ${nullSafeElementHash(keys, index, false, kt, result, ctx)}
    +            ${nullSafeElementHash(values, index, valueContainsNull, vt, result, ctx)}
    +          }
    +        """
     
           case StructType(fields) =>
    -        val result = ctx.freshName("result")
    -        val fieldsHash = fields.map(_.dataType).zipWithIndex.map {
    -          case (dt, index) =>
    -            val field = ctx.freshName("field")
    -            val fieldHash = computeHash(field, dt, result, ctx)
    -            s"""
    -              if (!$input.isNullAt($index)) {
    -                final ${ctx.javaType(dt)} $field = ${ctx.getValue(input, dt, index.toString)};
    -                ${fieldHash.code}
    -                $result = ${fieldHash.value};
    -              }
    -            """
    +        fields.zipWithIndex.map { case (field, index) =>
    +          nullSafeElementHash(input, index.toString, field.nullable, field.dataType, result, ctx)
             }.mkString("\n")
    -        val code =
    -          s"""
    -            int $result = $seed;
    -            $fieldsHash
    -          """
    -        ExprCode(code, "false", result)
     
    -      case udt: UserDefinedType[_] => computeHash(input, udt.sqlType, seed, ctx)
    +      case udt: UserDefinedType[_] => computeHash(input, udt.sqlType, result, ctx)
    --- End diff --
    
    This is a bug fix right? Do we have tests for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13072][SQL] simplify and improve murmur...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10974#issuecomment-176598205
  
    **[Test build #50341 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50341/consoleFull)** for PR 10974 at commit [`f6a3ccc`](https://github.com/apache/spark/commit/f6a3cccab507c66f29f12132c9656bb409263a13).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13072][SQL] simplify and improve murmur...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on the pull request:

    https://github.com/apache/spark/pull/10974#issuecomment-176895329
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13072][SQL] simplify and improve murmur...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10974#issuecomment-176578937
  
    **[Test build #50341 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50341/consoleFull)** for PR 10974 at commit [`f6a3ccc`](https://github.com/apache/spark/commit/f6a3cccab507c66f29f12132c9656bb409263a13).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13072][SQL] simplify and improve murmur...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10974#discussion_r51295244
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala ---
    @@ -325,36 +325,62 @@ case class Murmur3Hash(children: Seq[Expression], seed: Int) extends Expression
     
       override def genCode(ctx: CodegenContext, ev: ExprCode): String = {
         ev.isNull = "false"
    -    val childrenHash = children.zipWithIndex.map {
    -      case (child, dt) =>
    -        val childGen = child.gen(ctx)
    -        val childHash = computeHash(childGen.value, child.dataType, ev.value, ctx)
    -        s"""
    -          ${childGen.code}
    -          if (!${childGen.isNull}) {
    -            ${childHash.code}
    -            ${ev.value} = ${childHash.value};
    -          }
    -        """
    +    val childrenHash = children.map { child =>
    +      val childGen = child.gen(ctx)
    +      childGen.code + generateNullCheck(child.nullable, childGen.isNull) {
    +        computeHash(childGen.value, child.dataType, ev.value, ctx)
    +      }
         }.mkString("\n")
    +
         s"""
           int ${ev.value} = $seed;
           $childrenHash
         """
       }
     
    +  private def generateNullCheck(nullable: Boolean, isNull: String)(execution: String): String = {
    --- End diff --
    
    I think so, we can promote it to `CodeGenContext` and use it in other places in a follow-up PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13072][SQL] simplify and improve murmur...

Posted by nongli <gi...@git.apache.org>.

Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10974#discussion_r51294176
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala ---
    @@ -325,36 +325,62 @@ case class Murmur3Hash(children: Seq[Expression], seed: Int) extends Expression
     
       override def genCode(ctx: CodegenContext, ev: ExprCode): String = {
         ev.isNull = "false"
    -    val childrenHash = children.zipWithIndex.map {
    -      case (child, dt) =>
    -        val childGen = child.gen(ctx)
    -        val childHash = computeHash(childGen.value, child.dataType, ev.value, ctx)
    -        s"""
    -          ${childGen.code}
    -          if (!${childGen.isNull}) {
    -            ${childHash.code}
    -            ${ev.value} = ${childHash.value};
    -          }
    -        """
    +    val childrenHash = children.map { child =>
    +      val childGen = child.gen(ctx)
    +      childGen.code + generateNullCheck(child.nullable, childGen.isNull) {
    +        computeHash(childGen.value, child.dataType, ev.value, ctx)
    +      }
         }.mkString("\n")
    +
         s"""
           int ${ev.value} = $seed;
           $childrenHash
         """
       }
     
    +  private def generateNullCheck(nullable: Boolean, isNull: String)(execution: String): String = {
    --- End diff --
    
    Are there other places we can use this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13072][SQL] simplify and improve murmur...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10974#issuecomment-176572188
  
    cc @davies @nongli 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13072][SQL] simplify and improve murmur...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10974#issuecomment-176613193
  
    @rxin similar to https://github.com/apache/spark/pull/10333 , I only observed about 5% speed up. I think branch prediction works very well for this kind of case, the major benefit we can get is making generated code simplier and easier to debug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13072][SQL] simplify and improve murmur...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10974#issuecomment-176599074
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50341/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13072][SQL] simplify and improve murmur...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10974#issuecomment-176579204
  
    Any perf improvements?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org