You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bruce Robbins (Jira)" <ji...@apache.org> on 2022/12/31 23:39:00 UTC
[jira] [Commented] (SPARK-41804) InterpretedUnsafeProjection doesn't properly handle an array of UDTs

    [ https://issues.apache.org/jira/browse/SPARK-41804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653367#comment-17653367 ] 

Bruce Robbins commented on SPARK-41804:
---------------------------------------

I think I have a handle on what's going on here....

> InterpretedUnsafeProjection doesn't properly handle an array of UDTs
> --------------------------------------------------------------------
>
>                 Key: SPARK-41804
>                 URL: https://issues.apache.org/jira/browse/SPARK-41804
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Bruce Robbins
>            Priority: Major
>
> Reproduction steps:
> {noformat}
> // create a file of vector data
> import org.apache.spark.ml.linalg.{DenseMatrix, DenseVector, Matrix, Vector}
> case class TestRow(varr: Array[Vector])
> val values = Array(0.1d, 0.2d, 0.3d)
> val dv = new DenseVector(values).asInstanceOf[Vector]
> val ds = Seq(TestRow(Array(dv, dv))).toDS
> ds.coalesce(1).write.mode("overwrite").format("parquet").save("vector_data")
> // this works
> spark.read.format("parquet").load("vector_data").collect
> sql("set spark.sql.codegen.wholeStage=false")
> sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")
> // this will get an error
> spark.read.format("parquet").load("vector_data").collect
> {noformat}
> The error varies each time you run it, e.g.:
> {noformat}
> Sparse vectors require that the dimension of the indices match the dimension of the values.
> You provided 2 indices and  6619240 values.
> {noformat}
> or
> {noformat}
> org.apache.spark.SparkRuntimeException: Error while decoding: java.lang.NegativeArraySizeException
> {noformat}
> or
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
>   at org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.toDoubleArray(UnsafeArrayData.java:414)
> {noformat}
> or
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGBUS (0xa) at pc=0x00000001120c9d30, pid=64213, tid=0x0000000000001003
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_311-b11) (build 1.8.0_311-b11)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.311-b11 mixed mode bsd-amd64 compressed oops)
> # Problematic frame:
> # V  [libjvm.dylib+0xc9d30]  acl_CopyRight+0x29
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /<my-local-directory>/hs_err_pid64213.log
> Compiled method (nm)  582142 11318     n 0       sun.misc.Unsafe::copyMemory (native)
>  total in heap  [0x000000011efa8890,0x000000011efa8be8] = 856
>  relocation     [0x000000011efa89b8,0x000000011efa89f8] = 64
>  main code      [0x000000011efa8a00,0x000000011efa8be8] = 488
> Compiled method (nm)  582142 11318     n 0       sun.misc.Unsafe::copyMemory (native)
>  total in heap  [0x000000011efa8890,0x000000011efa8be8] = 856
>  relocation     [0x000000011efa89b8,0x000000011efa89f8] = 64
>  main code      [0x000000011efa8a00,0x000000011efa8be8] = 488
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org