You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/01/26 16:47:12 UTC

[GitHub] [iceberg] dilipbiswal opened a new pull request #2153: Add truncate transform support for MERGE INTO

dilipbiswal opened a new pull request #2153:
URL: https://github.com/apache/iceberg/pull/2153


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dilipbiswal commented on pull request #2153: Add truncate transform support for MERGE INTO

Posted by GitBox <gi...@apache.org>.
dilipbiswal commented on pull request #2153:
URL: https://github.com/apache/iceberg/pull/2153#issuecomment-768741155


   @rdblue @aokolnychyi 
   Could we please check if this looks okay now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2153: Add truncate transform support for MERGE INTO

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2153:
URL: https://github.com/apache/iceberg/pull/2153#discussion_r564737266



##########
File path: spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/utils/RewriteRowLevelOperationHelper.scala
##########
@@ -240,6 +244,25 @@ trait RewriteRowLevelOperationHelper extends PredicateHelper with Logging {
     }
   }
 
+  private object Lit {
+    def unapply[T](literal: Literal[T]): Some[(T, DataType)] = {
+      Some((literal.value, literal.dataType))
+    }
+  }
+
+  private object TruncateTransform {
+    def unapply(transform: Transform): Option[(FieldReference, Int)] = transform match {
+      case at @ ApplyTransform(name, _) if name.equalsIgnoreCase("truncate")  =>
+        at.args match {
+        case Seq(nf: NamedReference, Lit(value: Int, IntegerType)) =>
+          Some(FieldReference(nf.fieldNames()), value)

Review comment:
       I think the convention in Spark is to have the transform width first, like `bucket(4, id)`. In Iceberg, we usually detect it in either location. Could you add a case for `Seq(Lit, NamedReference)` to handle that order as well?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2153: Add truncate transform support for MERGE INTO

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2153:
URL: https://github.com/apache/iceberg/pull/2153#issuecomment-767831128


   LGTM other than existing comments.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dilipbiswal commented on a change in pull request #2153: Add truncate transform support for MERGE INTO

Posted by GitBox <gi...@apache.org>.
dilipbiswal commented on a change in pull request #2153:
URL: https://github.com/apache/iceberg/pull/2153#discussion_r564668207



##########
File path: spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/expressions/TransformExpressions.scala
##########
@@ -119,3 +118,38 @@ case class IcebergBucketTransform(numBuckets: Int, child: Expression) extends Ic
 
   override def dataType: DataType = IntegerType
 }
+
+case class IcebergTruncateTransform(child: Expression, width: Int) extends IcebergTransformExpression {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  @transient lazy val truncateFunc: Any => Any = child.dataType match {
+    case _: DecimalType =>
+      val t = Transforms.truncate[Any](icebergInputType, width)
+      d: Any =>  Decimal.apply(t(d.asInstanceOf[Decimal].toJavaBigDecimal).asInstanceOf[java.math.BigDecimal])
+    case _: StringType =>
+      val t = Transforms.truncate[String](Types.StringType.get(), width)
+      s: Any => UTF8String.fromString(t(s.toString))
+    case _: BinaryType =>
+      val t = Transforms.truncate[ByteBuffer](Types.BinaryType.get(), width)
+      s: Any => {
+        val res: ByteBuffer = t(ByteBuffer.wrap(s.asInstanceOf[Array[Byte]]))
+        val returnValue: Array[Byte] = new Array[Byte](res.limit())
+        res.rewind();
+        res.get(returnValue);

Review comment:
       @rdblue / @aokolnychyi Is there a better way to do this here ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #2153: Add truncate transform support for MERGE INTO

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #2153:
URL: https://github.com/apache/iceberg/pull/2153


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2153: Add truncate transform support for MERGE INTO

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2153:
URL: https://github.com/apache/iceberg/pull/2153#discussion_r564832877



##########
File path: spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/expressions/TransformExpressions.scala
##########
@@ -119,3 +118,38 @@ case class IcebergBucketTransform(numBuckets: Int, child: Expression) extends Ic
 
   override def dataType: DataType = IntegerType
 }
+
+case class IcebergTruncateTransform(child: Expression, width: Int) extends IcebergTransformExpression {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  @transient lazy val truncateFunc: Any => Any = child.dataType match {
+    case _: DecimalType =>
+      val t = Transforms.truncate[Any](icebergInputType, width)
+      d: Any =>  Decimal.apply(t(d.asInstanceOf[Decimal].toJavaBigDecimal).asInstanceOf[java.math.BigDecimal])

Review comment:
       nit: extra space




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi edited a comment on pull request #2153: Add truncate transform support for MERGE INTO

Posted by GitBox <gi...@apache.org>.
aokolnychyi edited a comment on pull request #2153:
URL: https://github.com/apache/iceberg/pull/2153#issuecomment-767831128


   LGTM other than existing comments. Thanks, @dilipbiswal!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2153: Add truncate transform support for MERGE INTO

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2153:
URL: https://github.com/apache/iceberg/pull/2153#discussion_r564735505



##########
File path: spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/expressions/TransformExpressions.scala
##########
@@ -119,3 +118,38 @@ case class IcebergBucketTransform(numBuckets: Int, child: Expression) extends Ic
 
   override def dataType: DataType = IntegerType
 }
+
+case class IcebergTruncateTransform(child: Expression, width: Int) extends IcebergTransformExpression {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  @transient lazy val truncateFunc: Any => Any = child.dataType match {
+    case _: DecimalType =>
+      val t = Transforms.truncate[Any](icebergInputType, width)
+      d: Any =>  Decimal.apply(t(d.asInstanceOf[Decimal].toJavaBigDecimal).asInstanceOf[java.math.BigDecimal])
+    case _: StringType =>
+      val t = Transforms.truncate[String](Types.StringType.get(), width)
+      s: Any => UTF8String.fromString(t(s.toString))
+    case _: BinaryType =>
+      val t = Transforms.truncate[ByteBuffer](Types.BinaryType.get(), width)
+      s: Any => {
+        val res: ByteBuffer = t(ByteBuffer.wrap(s.asInstanceOf[Array[Byte]]))
+        val returnValue: Array[Byte] = new Array[Byte](res.limit())
+        res.rewind();
+        res.get(returnValue);

Review comment:
       Yes, I recommend using `ByteBuffers.toByteArray`. That will handle the conversion back to byte array efficiently.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2153: Add truncate transform support for MERGE INTO

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2153:
URL: https://github.com/apache/iceberg/pull/2153#discussion_r564833802



##########
File path: spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/expressions/TransformExpressions.scala
##########
@@ -119,3 +118,38 @@ case class IcebergBucketTransform(numBuckets: Int, child: Expression) extends Ic
 
   override def dataType: DataType = IntegerType
 }
+
+case class IcebergTruncateTransform(child: Expression, width: Int) extends IcebergTransformExpression {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  @transient lazy val truncateFunc: Any => Any = child.dataType match {
+    case _: DecimalType =>
+      val t = Transforms.truncate[Any](icebergInputType, width)

Review comment:
       Can we define this as follows?
   
   ```
   val t = Transforms.truncate[java.math.BigDecimal](icebergInputType, length)
   d: Any =>
     val truncatedValue = t(d.asInstanceOf[Decimal].toJavaBigDecimal)
     Decimal(truncatedValue)
   ```
   
   I think we can get rid of the extra cast by using `Transforms.truncate[java.math.BigDecimal]`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2153: Add truncate transform support for MERGE INTO

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2153:
URL: https://github.com/apache/iceberg/pull/2153#discussion_r564834418



##########
File path: spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/expressions/TransformExpressions.scala
##########
@@ -119,3 +118,38 @@ case class IcebergBucketTransform(numBuckets: Int, child: Expression) extends Ic
 
   override def dataType: DataType = IntegerType
 }
+
+case class IcebergTruncateTransform(child: Expression, width: Int) extends IcebergTransformExpression {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  @transient lazy val truncateFunc: Any => Any = child.dataType match {
+    case _: DecimalType =>
+      val t = Transforms.truncate[Any](icebergInputType, width)
+      d: Any =>  Decimal.apply(t(d.asInstanceOf[Decimal].toJavaBigDecimal).asInstanceOf[java.math.BigDecimal])
+    case _: StringType =>
+      val t = Transforms.truncate[String](Types.StringType.get(), width)

Review comment:
       I think we can use `icebergInputType` defined in the parent trait instead of the explicit Types.StringType.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dilipbiswal edited a comment on pull request #2153: Add truncate transform support for MERGE INTO

Posted by GitBox <gi...@apache.org>.
dilipbiswal edited a comment on pull request #2153:
URL: https://github.com/apache/iceberg/pull/2153#issuecomment-768741155


   @rdblue @aokolnychyi 
   Could we please check if this looks okay now ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2153: Add truncate transform support for MERGE INTO

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2153:
URL: https://github.com/apache/iceberg/pull/2153#discussion_r564734739



##########
File path: spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/expressions/TransformExpressions.scala
##########
@@ -119,3 +118,38 @@ case class IcebergBucketTransform(numBuckets: Int, child: Expression) extends Ic
 
   override def dataType: DataType = IntegerType
 }
+
+case class IcebergTruncateTransform(child: Expression, width: Int) extends IcebergTransformExpression {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  @transient lazy val truncateFunc: Any => Any = child.dataType match {
+    case _: DecimalType =>
+      val t = Transforms.truncate[Any](icebergInputType, width)
+      d: Any =>  Decimal.apply(t(d.asInstanceOf[Decimal].toJavaBigDecimal).asInstanceOf[java.math.BigDecimal])
+    case _: StringType =>
+      val t = Transforms.truncate[String](Types.StringType.get(), width)
+      s: Any => UTF8String.fromString(t(s.toString))

Review comment:
       `TruncateString` accepts a `CharSequence`, not just `String`. It's unfortunate that `UTF8String` doesn't implement `CharSequence`, but we can fairly easily get one from the underlying bytes: `StandardCharsets.UTF_8.decode(ByteBuffer.wrap(s.asInstanceOf[UTF8String].getBytes))`
   
   That will return a `CharSequence` that is not a copy of the bytes, then we can convert that back to a byte buffer and create the `UTF8String` from that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org