You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Dhruve Ashar (JIRA)" <ji...@apache.org> on 2016/10/04 22:00:22 UTC

[jira] [Commented] (SPARK-17417) Fix # of partitions for RDD while checkpointing - Currently limited by 10000(%05d)

    [ https://issues.apache.org/jira/browse/SPARK-17417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546792#comment-15546792 ] 

Dhruve Ashar commented on SPARK-17417:
--------------------------------------

[~srowen] AFAIU the checkpointing mechanism in spark core, the recovery of an RDD from a checkpoint is limited to an application attempt. Spark streaming mentions that it can recover metadata/rdd from checkpointed data across application attempts. Please correct me if I have missed something here. With this understanding it wouldn't be necessary to parse the code for the old format as the recovery would be done using the same spark jar which was used to launch it. 

Also why is it that we are not cleaning up the checkpointed directory on sc.close ?

> Fix # of partitions for RDD while checkpointing - Currently limited by 10000(%05d)
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-17417
>                 URL: https://issues.apache.org/jira/browse/SPARK-17417
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Dhruve Ashar
>
> Spark currently assumes # of partitions to be less than 100000 and uses %05d padding. 
> If we exceed this no., the sort logic in ReliableCheckpointRDD gets messed up and fails. This is because of part-files are sorted and compared as strings. 
> This leads filename order to be part-10000, part-100000, ... instead of part-10000, part-10001, ..., part-100000 and while reconstructing the checkpointed RDD the job fails. 
> Possible solutions: 
> - Bump the padding to allow more partitions or
> - Sort the part files extracting a sub-portion as string and then verify the RDD



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org