You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Srinivas V <sr...@gmail.com> on 2020/05/23 11:13:55 UTC

[structured streaming] [stateful] Null value appeared in non-nullable field

Hello,
 I am listening to a kaka topic through Spark Structured Streaming [2.4.5].
After processing messages for few mins, I am getting below
NullPointerException.I have three beans used here 1.Event 2.StateInfo
3.SessionUpdateInfo. I am suspecting that the problem is with StateInfo,
when it is writing state to hdfs it might be failing or it could be failing
while I update accumulators. But why would it fail for some events but not
for others? Once it fails, it stops the Streaming query.
When I send all fields null except EevntId in my testing, it works fine.
Any idea what could be happening?
Attaching the full stack trace as well.
This is a - yarn cluster, saving state in HDFS.

Exception:

20/05/23 09:46:46 ERROR
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask:
Aborting commit for partition 42 (task 118121, attempt 9, stage 824.0)
20/05/23 09:46:46 ERROR
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask:
Aborted commit for partition 42 (task 118121, attempt 9, stage 824.0)
20/05/23 09:46:46 ERROR org.apache.spark.executor.Executor: Exception
in task 42.9 in stage 824.0 (TID 118121)
java.lang.NullPointerException: Null value appeared in non-nullable field:
top level input bean
If the schema is inferred from a Scala tuple/case class, or a Java
bean, please try to use scala.Option[_] or other nullable types (e.g.
java.lang.Integer instead of int/scala.Int).
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.serializefromobject_doConsume_0$(Unknown
Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:117)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:116)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:146)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
20/05/23 09:47:48 ERROR
org.apache.spark.executor.CoarseGrainedExecutorBackend: RECEIVED
SIGNAL TERM



Regards

Srini V

Re: [structured streaming] [stateful] Null value appeared in non-nullable field

Posted by Jungtaek Lim <ka...@gmail.com>.
Hi,

Only with stack trace there’s nothing to look into it. It’d be better to
provide simple reproducer (code, and problematic inputs) so that someone
may give it a try.

You may also want to give it a try with 3.0.0, RC2 is better to test
against, but preview2 would be easier for end users to test.

2020년 5월 23일 (토) 오후 8:14, Srinivas V <sr...@gmail.com>님이 작성:

> Hello,
>  I am listening to a kaka topic through Spark Structured Streaming
> [2.4.5]. After processing messages for few mins, I am getting below
> NullPointerException.I have three beans used here 1.Event 2.StateInfo
> 3.SessionUpdateInfo. I am suspecting that the problem is with StateInfo,
> when it is writing state to hdfs it might be failing or it could be failing
> while I update accumulators. But why would it fail for some events but not
> for others? Once it fails, it stops the Streaming query.
> When I send all fields null except EevntId in my testing, it works fine.
> Any idea what could be happening?
> Attaching the full stack trace as well.
> This is a - yarn cluster, saving state in HDFS.
>
> Exception:
>
> 20/05/23 09:46:46 ERROR org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask: Aborting commit for partition 42 (task 118121, attempt 9, stage 824.0)
> 20/05/23 09:46:46 ERROR org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask: Aborted commit for partition 42 (task 118121, attempt 9, stage 824.0)
> 20/05/23 09:46:46 ERROR org.apache.spark.executor.Executor: Exception in task 42.9 in stage 824.0 (TID 118121)
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> top level input bean
> If the schema is inferred from a Scala tuple/case class, or a Java bean, please try to use scala.Option[_] or other nullable types (e.g. java.lang.Integer instead of int/scala.Int).
> 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.serializefromobject_doConsume_0$(Unknown Source)
> 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source)
> 	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> 	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
> 	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:117)
> 	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:116)
> 	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
> 	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:146)
> 	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
> 	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
> 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:123)
> 	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
> 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> 20/05/23 09:47:48 ERROR org.apache.spark.executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
>
>
>
> Regards
>
> Srini V
>
>