You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Shixiong Zhu (JIRA)" <ji...@apache.org> on 2016/12/27 18:44:58 UTC

[jira] [Commented] (SPARK-19013) java.util.ConcurrentModificationException when using s3 path as checkpointLocation

    [ https://issues.apache.org/jira/browse/SPARK-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781019#comment-15781019 ] 

Shixiong Zhu commented on SPARK-19013:
--------------------------------------

This is probably because S3 negative cache. 

"a negative GET may be cached, such that even if an object is immediately created, the fact that there "wasn't" an object is still remembered." See https://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/s3-trouble/index.html#visible-s3-inconsistency for details.

> java.util.ConcurrentModificationException when using s3 path as checkpointLocation 
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-19013
>                 URL: https://issues.apache.org/jira/browse/SPARK-19013
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.0.2
>            Reporter: Tim Chan
>
> I have a structured stream job running on EMR. The job will fail due to this
> ```
> Multiple HDFSMetadataLog are using s3://mybucket/myapp org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162)
> ```
> There is only one instance of this stream job running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org