You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Eyal Zituny (JIRA)" <ji...@apache.org> on 2019/03/31 09:06:00 UTC
[jira] [Commented] (SPARK-27330) ForeachWriter is not being closed
once a batch is aborted
[ https://issues.apache.org/jira/browse/SPARK-27330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806086#comment-16806086 ]
Eyal Zituny commented on SPARK-27330:
-------------------------------------
guys, let me know if my fix suggestion is make sense and i will provide a PR
> ForeachWriter is not being closed once a batch is aborted
> ---------------------------------------------------------
>
> Key: SPARK-27330
> URL: https://issues.apache.org/jira/browse/SPARK-27330
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 2.4.0
> Reporter: Eyal Zituny
> Priority: Major
>
> in cases where a micro batch is being killed (interrupted), not during actual processing done by the ForeachDataWriter (when iterating the iterator), DataWritingSparkTask will handle the interruption and call dataWriter.abort()
> the problem is that ForeachDataWriter has an empty implementation for the abort method.
> as a result of that, i have encountered issues in connections which were opened in the "open" method when the writer has been created but never closed.
> this wasn't the behavior pre spark 2.4
> my suggestion is to call ForeachWriter.close() when DataWriter.abort() is called, and exception should also be provided in order to notify the foreach writer that this task has failed
>
> stack trace from the exception i have encountered:
> org.apache.spark.TaskKilledException: null
> at org.apache.spark.TaskContextImpl.killTaskIfInterrupted(TaskContextImpl.scala:149)
> at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:36)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:117)
> at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:116)
> at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
> at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:146)
> at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
> at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org