You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "bersprockets (via GitHub)" <gi...@apache.org> on 2023/12/14 21:11:45 UTC

[PR] [SPARK-46289][SQL] Support ordering UDTs in interpreted mode [spark]

bersprockets opened a new pull request, #44361:
URL: https://github.com/apache/spark/pull/44361

   ### What changes were proposed in this pull request?
   
   When comparing two UDT values in interpreted mode, treat each value as an instance of the UDT's underlying type.
   
   ### Why are the changes needed?
   
   Consider the following code:
   ```
   import org.apache.spark.ml.linalg.{DenseVector, Vector}
   
   val df = Seq.tabulate(30) { x =>
     (x, x + 1, x + 2, new DenseVector(Array((x/100.0).toDouble, ((x + 1)/100.0).toDouble, ((x + 3)/100.0).toDouble)))
   }.toDF("id", "c1", "c2", "c3")
   
   df.createOrReplaceTempView("df")
   
   // this works
   sql("select * from df order by c3").collect
   
   sql("set spark.sql.codegen.wholeStage=false")
   sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")
   
   // this gets an error
   sql("select * from df order by c3").collect
   ```
   The first collect action works. However, the second collect action, which runs in interpreted mode, gets the following exception:
   ```
   org.apache.spark.SparkIllegalArgumentException: Type UninitializedPhysicalType does not support ordered operations.
   	at org.apache.spark.sql.errors.QueryExecutionErrors$.orderedOperationUnsupportedByDataTypeError(QueryExecutionErrors.scala:348)
   	at org.apache.spark.sql.catalyst.types.UninitializedPhysicalType$.ordering(PhysicalDataType.scala:332)
   	at org.apache.spark.sql.catalyst.types.UninitializedPhysicalType$.ordering(PhysicalDataType.scala:329)
   	at org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.compare(ordering.scala:60)
   	at org.apache.spark.sql.catalyst.expressions.InterpretedOrdering.compare(ordering.scala:39)
   	at org.apache.spark.sql.execution.UnsafeExternalRowSorter$RowComparator.compare(UnsafeExternalRowSorter.java:254)
   ```
   The code generator creates code that compares UDTs based on their underlying type. See [here](https://github.com/apache/spark/blob/c045a425bf0c472f164e3ef75a8a2c68d72d61d3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L721).
   
   On the other hand, the interpreted mode code tries to compare the values as UDTs, not as their underlying types. This PR brings interpreted mode code in line with the generated code.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New test.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46289][SQL] Support ordering UDTs in interpreted mode [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #44361: [SPARK-46289][SQL] Support ordering UDTs in interpreted mode
URL: https://github.com/apache/spark/pull/44361


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org