You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2014/12/30 11:51:13 UTC

[jira] [Commented] (SPARK-5001) BlockRDD removed unreasonablly in streaming

    [ https://issues.apache.org/jira/browse/SPARK-5001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14260989#comment-14260989 ] 

Sean Owen commented on SPARK-5001:
----------------------------------

[~hanhg] You should make a pull request against Github instead of posting a patch. See http://github.com/apache/spark . I think that in general you'd want to design your system to not have such wide variance in completion time, such that old RDDs finish after newer ones, but there may indeed be ways to tighten this up.

> BlockRDD removed unreasonablly in streaming
> -------------------------------------------
>
>                 Key: SPARK-5001
>                 URL: https://issues.apache.org/jira/browse/SPARK-5001
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.0.2, 1.1.1, 1.2.0
>            Reporter: hanhonggen
>         Attachments: fix_bug_BlockRDD_removed_not_reasonablly_in_streaming.patch
>
>
> I've counted messages using kafkainputstream of spark-1.1.1. The test app failed when the latter batch job completed sooner than the previous. In the source code, BlockRDDs older than (time-rememberDuration) will be removed in cleanMetaData after one job completed. And the previous job will abort due to block not found.The relevant log are as follows:
> 2014-12-25 14:07:12(Logging.scala:59)[sparkDriver-akka.actor.default-dispatcher-14] INFO :Starting job streaming job 1419487632000 ms.0 from job set of time 1419487632000 ms
> 2014-12-25 14:07:15(Logging.scala:59)[sparkDriver-akka.actor.default-dispatcher-14] INFO :Starting job streaming job 1419487635000 ms.0 from job set of time 1419487635000 ms
> 2014-12-25 14:07:15(Logging.scala:59)[sparkDriver-akka.actor.default-dispatcher-15] INFO :Finished job streaming job 1419487635000 ms.0 from job set of time 1419487635000 ms
> 2014-12-25 14:07:15(Logging.scala:59)[sparkDriver-akka.actor.default-dispatcher-16] INFO :Removing blocks of RDD BlockRDD[3028] at createStream at TestKafka.java:144 of time 1419487635000 ms from DStream clearMetadata
> java.lang.Exception: Could not compute split, block input-0-1419487631400 not found for 3028



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org