You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Brandon Dahler (Jira)" <ji...@apache.org> on 2021/12/15 17:59:00 UTC
[jira] [Created] (SPARK-37654) Regression - NullPointerException in Row.getSeq when field null
Brandon Dahler created SPARK-37654:
--------------------------------------
Summary: Regression - NullPointerException in Row.getSeq when field null
Key: SPARK-37654
URL: https://issues.apache.org/jira/browse/SPARK-37654
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.2.0, 3.1.2, 3.1.1
Environment: Tested against the following releases using the provided reproduction steps:
# spark-3.0.3-bin-hadoop2.7 - Succeeded
{code:java}
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.0.3
/_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) {code}
# spark-3.1.2-bin-hadoop3.2 - Failed
{code:java}
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.1.2
/_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) {code}
# spark-3.2.0-bin-hadoop3.2 - Failed
{code:java}
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.2.0
/_/Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) {code}
Reporter: Brandon Dahler
h2. Description
A NullPointerException occurs in _org.apache.spark.sql.Row.getSeq(int)_ if the row contains a _null_ value at the requested index.
{code:java}
java.lang.NullPointerException
at org.apache.spark.sql.Row.getSeq(Row.scala:319)
at org.apache.spark.sql.Row.getSeq$(Row.scala:319)
at org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166)
at org.apache.spark.sql.Row.getList(Row.scala:327)
at org.apache.spark.sql.Row.getList$(Row.scala:326)
at org.apache.spark.sql.catalyst.expressions.GenericRow.getList(rows.scala:166)
...
{code}
Prior to 3.1.1, the code would not throw an exception and instead would return a null _Seq_ instance.
h2. Reproduction
# Start a new spark-shell instance
# Execute the following script:
{code:scala}
import org.apache.spark.sql.Row
Row(Seq("value")).getSeq(0)
Row(Seq()).getSeq(0)
Row(null).getSeq(0) {code}
h3. Expected Output
res2 outputs a _null_ value.
{code:java}
scala> import org.apache.spark.sql.Row
import org.apache.spark.sql.Row
scala>
scala> Row(Seq("value")).getSeq(0)
res0: Seq[Nothing] = List(value)
scala> Row(Seq()).getSeq(0)
res1: Seq[Nothing] = List()
scala> Row(null).getSeq(0)
res2: Seq[Nothing] = null
{code}
h3. Actual Output
res2 throws a NullPointerException.
{code:java}
scala> import org.apache.spark.sql.Row
import org.apache.spark.sql.Row
scala>
scala> Row(Seq("value")).getSeq(0)
res0: Seq[Nothing] = List(value)
scala> Row(Seq()).getSeq(0)
res1: Seq[Nothing] = List()
scala> Row(null).getSeq(0)
java.lang.NullPointerException
at org.apache.spark.sql.Row.getSeq(Row.scala:319)
at org.apache.spark.sql.Row.getSeq$(Row.scala:319)
at org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166)
... 47 elided
{code}
h2. Regression Source
The regression appears to have been introduced in [25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb|https://github.com/apache/spark/commit/25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb#diff-722324a11a0e4635a59a9debc962da2c1678d86702a9a106fd0d51188f83853bR317], which addressed [SPARK-32526|https://issues.apache.org/jira/browse/SPARK-32526]
h2. Work Around
This regression can be worked around by using _Row.isNullAt(int)_ and handling the null scenario in user code, prior to calling _Row.getSeq(int)_ or _Row.getList(int)_.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org