You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bruce Robbins (Jira)" <ji...@apache.org> on 2022/12/31 23:24:00 UTC
[jira] [Updated] (SPARK-41804) InterpretedUnsafeProjection doesn't properly handle an array of UDTs

     [ https://issues.apache.org/jira/browse/SPARK-41804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bruce Robbins updated SPARK-41804:
----------------------------------
    Description: 
Reproduction steps:
{noformat}
// create a file of vector data
import org.apache.spark.ml.linalg.{DenseMatrix, DenseVector, Matrix, Vector}

case class TestRow(varr: Array[Vector])
val values = Array(0.1d, 0.2d, 0.3d)
val dv = new DenseVector(values).asInstanceOf[Vector]

val ds = Seq(TestRow(Array(dv, dv))).toDS
ds.coalesce(1).write.mode("overwrite").format("parquet").save("vector_data")

// this works
spark.read.format("parquet").load("vector_data").collect

sql("set spark.sql.codegen.wholeStage=false")
sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")

// this will get an error
spark.read.format("parquet").load("vector_data").collect
{noformat}
The error varies each time you run it, e.g.:
{noformat}
Sparse vectors require that the dimension of the indices match the dimension of the values.
You provided 2 indices and  6619240 values.
{noformat}
or
{noformat}
org.apache.spark.SparkRuntimeException: Error while decoding: java.lang.NegativeArraySizeException
{noformat}
or
{noformat}
java.lang.OutOfMemoryError: Java heap space
  at org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.toDoubleArray(UnsafeArrayData.java:414)
{noformat}
or
{noformat}
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0xa) at pc=0x00000001120c9d30, pid=64213, tid=0x0000000000001003
#
# JRE version: Java(TM) SE Runtime Environment (8.0_311-b11) (build 1.8.0_311-b11)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.311-b11 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.dylib+0xc9d30]  acl_CopyRight+0x29
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /<my-local-directory>/hs_err_pid64213.log
Compiled method (nm)  582142 11318     n 0       sun.misc.Unsafe::copyMemory (native)
 total in heap  [0x000000011efa8890,0x000000011efa8be8] = 856
 relocation     [0x000000011efa89b8,0x000000011efa89f8] = 64
 main code      [0x000000011efa8a00,0x000000011efa8be8] = 488
Compiled method (nm)  582142 11318     n 0       sun.misc.Unsafe::copyMemory (native)
 total in heap  [0x000000011efa8890,0x000000011efa8be8] = 856
 relocation     [0x000000011efa89b8,0x000000011efa89f8] = 64
 main code      [0x000000011efa8a00,0x000000011efa8be8] = 488
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
{noformat}


> InterpretedUnsafeProjection doesn't properly handle an array of UDTs
> --------------------------------------------------------------------
>
>                 Key: SPARK-41804
>                 URL: https://issues.apache.org/jira/browse/SPARK-41804
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.4.0
>         Environment: Reproduction steps:
> {noformat}
> // create a file of vector data
> import org.apache.spark.ml.linalg.{DenseMatrix, DenseVector, Matrix, Vector}
> case class TestRow(varr: Array[Vector])
> val values = Array(0.1d, 0.2d, 0.3d)
> val dv = new DenseVector(values).asInstanceOf[Vector]
> val ds = Seq(TestRow(Array(dv, dv))).toDS
> ds.coalesce(1).write.mode("overwrite").format("parquet").save("vector_data")
> // this works
> spark.read.format("parquet").load("vector_data").collect
> sql("set spark.sql.codegen.wholeStage=false")
> sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")
> // this will get an error
> spark.read.format("parquet").load("vector_data").collect
> {noformat}
> The error varies each time you run it, e.g.:
> {noformat}
> Sparse vectors require that the dimension of the indices match the dimension of the values.
> You provided 2 indices and  6619240 values.
> {noformat}
> or
> {noformat}
> org.apache.spark.SparkRuntimeException: Error while decoding: java.lang.NegativeArraySizeException
> {noformat}
> or
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
>   at org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.toDoubleArray(UnsafeArrayData.java:414)
> {noformat}
> or
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGBUS (0xa) at pc=0x00000001120c9d30, pid=64213, tid=0x0000000000001003
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_311-b11) (build 1.8.0_311-b11)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.311-b11 mixed mode bsd-amd64 compressed oops)
> # Problematic frame:
> # V  [libjvm.dylib+0xc9d30]  acl_CopyRight+0x29
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /<my-local-directory>/hs_err_pid64213.log
> Compiled method (nm)  582142 11318     n 0       sun.misc.Unsafe::copyMemory (native)
>  total in heap  [0x000000011efa8890,0x000000011efa8be8] = 856
>  relocation     [0x000000011efa89b8,0x000000011efa89f8] = 64
>  main code      [0x000000011efa8a00,0x000000011efa8be8] = 488
> Compiled method (nm)  582142 11318     n 0       sun.misc.Unsafe::copyMemory (native)
>  total in heap  [0x000000011efa8890,0x000000011efa8be8] = 856
>  relocation     [0x000000011efa89b8,0x000000011efa89f8] = 64
>  main code      [0x000000011efa8a00,0x000000011efa8be8] = 488
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> {noformat}
>            Reporter: Bruce Robbins
>            Priority: Major
>
> Reproduction steps:
> {noformat}
> // create a file of vector data
> import org.apache.spark.ml.linalg.{DenseMatrix, DenseVector, Matrix, Vector}
> case class TestRow(varr: Array[Vector])
> val values = Array(0.1d, 0.2d, 0.3d)
> val dv = new DenseVector(values).asInstanceOf[Vector]
> val ds = Seq(TestRow(Array(dv, dv))).toDS
> ds.coalesce(1).write.mode("overwrite").format("parquet").save("vector_data")
> // this works
> spark.read.format("parquet").load("vector_data").collect
> sql("set spark.sql.codegen.wholeStage=false")
> sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")
> // this will get an error
> spark.read.format("parquet").load("vector_data").collect
> {noformat}
> The error varies each time you run it, e.g.:
> {noformat}
> Sparse vectors require that the dimension of the indices match the dimension of the values.
> You provided 2 indices and  6619240 values.
> {noformat}
> or
> {noformat}
> org.apache.spark.SparkRuntimeException: Error while decoding: java.lang.NegativeArraySizeException
> {noformat}
> or
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
>   at org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.toDoubleArray(UnsafeArrayData.java:414)
> {noformat}
> or
> {noformat}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGBUS (0xa) at pc=0x00000001120c9d30, pid=64213, tid=0x0000000000001003
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_311-b11) (build 1.8.0_311-b11)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.311-b11 mixed mode bsd-amd64 compressed oops)
> # Problematic frame:
> # V  [libjvm.dylib+0xc9d30]  acl_CopyRight+0x29
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /<my-local-directory>/hs_err_pid64213.log
> Compiled method (nm)  582142 11318     n 0       sun.misc.Unsafe::copyMemory (native)
>  total in heap  [0x000000011efa8890,0x000000011efa8be8] = 856
>  relocation     [0x000000011efa89b8,0x000000011efa89f8] = 64
>  main code      [0x000000011efa8a00,0x000000011efa8be8] = 488
> Compiled method (nm)  582142 11318     n 0       sun.misc.Unsafe::copyMemory (native)
>  total in heap  [0x000000011efa8890,0x000000011efa8be8] = 856
>  relocation     [0x000000011efa89b8,0x000000011efa89f8] = 64
>  main code      [0x000000011efa8a00,0x000000011efa8be8] = 488
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org