You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by yucai <gi...@git.apache.org> on 2016/02/16 03:48:48 UTC

[GitHub] spark pull request: [WIP][SQL] Decimal datatype support for pow

GitHub user yucai opened a pull request:

    https://github.com/apache/spark/pull/11212

    [WIP][SQL] Decimal datatype support for pow

    Decimal datatype support for pow
    - when base is Decimal and exponent is integer(Byte, Short, Int), return Decimal
    - otherwise, return Double

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yucai/spark decimal_for_pow

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11212.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11212
    
----
commit cce48076151403cf456493e5b3756bc7aa6e1713
Author: Yucai Yu <yu...@intel.com>
Date:   2016-02-02T02:52:06Z

    Decimal support for pow

commit e353ab5ac36e53103def8f17171e805f5ad73872
Author: Yucai Yu <yu...@intel.com>
Date:   2016-02-16T02:39:18Z

    Merge remote-tracking branch 'origin/master' into up_decimal

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #11212: [SPARK-13332][SQL] Decimal datatype support for S...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11212


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...

Posted by adrian-wang <gi...@git.apache.org>.

Github user adrian-wang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11212#discussion_r53115148
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java ---
    @@ -170,6 +170,7 @@ public void write(int ordinal, double value) {
       }
     
       public void write(int ordinal, Decimal input, int precision, int scale) {
    +    input = input.clone();
    --- End diff --
    
    Here we'll call `changePrecision` on `input` here, which would affect the orignal data. I agree that this is a bad idea, maybe we need to propose a separate pr to work around this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...

Posted by adrian-wang <gi...@git.apache.org>.

Github user adrian-wang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11212#discussion_r52968287
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathFunctionsSuite.scala ---
    @@ -351,6 +350,20 @@ class MathFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper {
       }
     
       test("pow") {
    +    testBinary(Pow, (d: Decimal, n: Byte) => d.pow(n),
    +      (-5 to 5).map(v => (Decimal(v * 1.0), v.toByte)))
    --- End diff --
    
    maybe `v.toDouble` is better


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11212#issuecomment-184860431
  
    I think it'd be a lot simpler if we create a separate Pow for Decimal, and handle byte/short/etc to integer in type coercion, rather than in the PowDecimal class.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SQL] Decimal datatype support for pow

Posted by yucai <gi...@git.apache.org>.

Github user yucai commented on the pull request:

    https://github.com/apache/spark/pull/11212#issuecomment-184495551
  
    @adrian-wang could you help review? Much thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...

Posted by adrian-wang <gi...@git.apache.org>.

Github user adrian-wang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11212#discussion_r52968376
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java ---
    @@ -170,6 +170,7 @@ public void write(int ordinal, double value) {
       }
     
       public void write(int ordinal, Decimal input, int precision, int scale) {
    +    input = input.clone();
    --- End diff --
    
    Better add a comment that explains why we need to clone before write.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SQL] Decimal datatype support for pow

Posted by adrian-wang <gi...@git.apache.org>.

Github user adrian-wang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11212#discussion_r52968214
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathFunctionsSuite.scala ---
    @@ -103,8 +103,7 @@ class MathFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper {
           }
         } else {
           domain.foreach { case (v1, v2) =>
    -        checkEvaluation(c(Literal(v1), Literal(v2)), f(v1 + 0.0, v2 + 0.0), EmptyRow)
    -        checkEvaluation(c(Literal(v2), Literal(v1)), f(v2 + 0.0, v1 + 0.0), EmptyRow)
    +        checkEvaluation(c(Literal(v1), Literal(v2)), f(v1, v2), EmptyRow)
    --- End diff --
    
    keep the test of `f(v2, v1)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...

Posted by yucai <gi...@git.apache.org>.

Github user yucai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11212#discussion_r53275413
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java ---
    @@ -170,6 +170,7 @@ public void write(int ordinal, double value) {
       }
     
       public void write(int ordinal, Decimal input, int precision, int scale) {
    +    input = input.clone();
    --- End diff --
    
    As Adrian mentioned, we need a copy of input, otherwise `changePrecision` would change the original input.
    In our case, this means `catalystValue`(expected value) would be changed when `checkEvalutionWithUnsafeProjection` is invoked, and then all tests after checkEvalutionWithUnsafeProjection will fail.
    ```
      protected def checkEvaluation(
          expression: => Expression, expected: Any, inputRow: InternalRow = EmptyRow): Unit = {
        val catalystValue = CatalystTypeConverters.convertToCatalyst(expected)
        checkEvaluationWithoutCodegen(expression, catalystValue, inputRow)
        checkEvaluationWithGeneratedMutableProjection(expression, catalystValue, inputRow)
        if (GenerateUnsafeProjection.canSupport(expression.dataType)) {
          checkEvalutionWithUnsafeProjection(expression, catalystValue, inputRow)
        }
        checkEvaluationWithOptimization(expression, catalystValue, inputRow)
      }
    ```
    Does it make sense? Any suggestion is great helpful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SQL] Decimal datatype support for pow

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11212#issuecomment-184489236
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11212#discussion_r53071927
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java ---
    @@ -170,6 +170,7 @@ public void write(int ordinal, double value) {
       }
     
       public void write(int ordinal, Decimal input, int precision, int scale) {
    +    input = input.clone();
    --- End diff --
    
    Why is this necessary? Seems like a really bad idea.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...

Posted by yucai <gi...@git.apache.org>.

Github user yucai commented on the pull request:

    https://github.com/apache/spark/pull/11212#issuecomment-186077126
  
    @rxin I tried your suggestion like creating PowDecimal for Decimal specially like below:
     
    ```
    case class PowDecimal(left: Expression, right: Expression)
      extends BinaryMathExpression(math.pow, "POWER") {
      override def inputTypes: Seq[AbstractDataType] = Seq(DecimalType, IntegerType)
      ...
    }
    
    case class Pow(left: Expression, right: Expression)
      extends BinaryMathExpression(math.pow, "POWER") {
      ...
    } 
    ``` 
    
    But one concern is when "select pow(cast(2 as decimal(5,2)), 3)", how to make the "PowDecimal" node created? The current path will create "Pow" node anyway.
    
    So we think of maybe we can still put Decimal processing in "Pow", but byte/short/etc. to integer in type coercion, like below:
    
    ```
    case class Pow(left: Expression, right: Expression)
      extends BinaryMathExpression(math.pow, "POWER") {
      override def inputTypes: Seq[AbstractDataType] = Seq(NumericType, NumericType)
    
      override def dataType: DataType = (left.dataType, right.dataType) match {
        case (dt: DecimalType, ByteType | ShortType | IntegerType) => dt
        case _ => DoubleType
      }
      protected override def nullSafeEval(input1: Any, input2: Any): Any =
        (left.dataType, right.dataType) match {
          case (dt: DecimalType, _) => input1.asInstanceOf[Decimal].pow(input2.asInstanceOf[Int])
          case _ => math.pow(input1.asInstanceOf[Double], input2.asInstanceOf[Double])
        }
      override def genCode(ctx: CodegenContext, ev: ExprCode): String = ...
    }
      
    In HiveTypeCoercion:
    ```
      object PowCoercion extends Rule[LogicalPlan] {
        def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
          case e if !e.childrenResolved => e
          case e @ Pow(left, right) =>
            (left.dataType, right.dataType) match {
              case (dt: DecimalType, IntegerType) => e
              case (DoubleType, DoubleType) => e
              case (dt: DecimalType, ByteType | ShortType) =>
                Pow(left, Cast(right, IntegerType))
              case _ => Pow(Cast(left, DoubleType), Cast(right, DoubleType))
            }
        }
      }
    ```
    How do you think this way?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...

Posted by adrian-wang <gi...@git.apache.org>.

Github user adrian-wang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11212#discussion_r52968764
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala ---
    @@ -523,11 +523,45 @@ case class Atan2(left: Expression, right: Expression)
     
     case class Pow(left: Expression, right: Expression)
       extends BinaryMathExpression(math.pow, "POWER") {
    -  override def genCode(ctx: CodegenContext, ev: ExprCode): String = {
    -    defineCodeGen(ctx, ev, (c1, c2) => s"java.lang.Math.pow($c1, $c2)")
    -  }
    -}
    +  override def inputTypes: Seq[AbstractDataType] = Seq(NumericType, NumericType)
    +
    +  override def dataType: DataType = (left.dataType, right.dataType) match {
    +    case (dt: DecimalType, ByteType | ShortType | IntegerType) => dt
    +    case _ => DoubleType
    +  }
    +
    +  protected override def nullSafeEval(input1: Any, input2: Any): Any =
    +    (left.dataType, right.dataType) match {
    +      case (dt: DecimalType, ByteType) =>
    +        input1.asInstanceOf[Decimal].pow(input2.asInstanceOf[Byte])
    +      case (dt: DecimalType, ShortType) =>
    +        input1.asInstanceOf[Decimal].pow(input2.asInstanceOf[Short])
    +      case (dt: DecimalType, IntegerType) =>
    +        input1.asInstanceOf[Decimal].pow(input2.asInstanceOf[Int])
    +      case (dt: DecimalType, FloatType) =>
    +        math.pow(input1.asInstanceOf[Decimal].toDouble, input2.asInstanceOf[Float])
    +      case (dt: DecimalType, DoubleType) =>
    +        math.pow(input1.asInstanceOf[Decimal].toDouble, input2.asInstanceOf[Double])
    +      case (dt1: DecimalType, dt2: DecimalType) =>
    +        math.pow(input1.asInstanceOf[Decimal].toDouble, input2.asInstanceOf[Decimal].toDouble)
    --- End diff --
    
    Shall we cast the result of `math.pow` back to `DecimalType` for these three cases?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...

Posted by yucai <gi...@git.apache.org>.

Github user yucai commented on the pull request:

    https://github.com/apache/spark/pull/11212#issuecomment-185717402
  
    OK, let me try this implementation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #11212: [SPARK-13332][SQL] Decimal datatype support for SQL pow

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/11212
  
    Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it or create a new one. We can also continue the discussion on the JIRA ticket.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org