You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by pw...@apache.org on 2013/12/31 19:13:19 UTC

[16/16] git commit: Merge pull request #289 from tdas/filestream-fix

Merge pull request #289 from tdas/filestream-fix

Bug fixes for file input stream and checkpointing

- Fixed bugs in the file input stream that led the stream to fail due to transient HDFS errors (listing files when a background thread it deleting fails caused errors, etc.)
- Updated Spark's CheckpointRDD and Streaming's CheckpointWriter to use SparkContext.hadoopConfiguration, to allow checkpoints to be written to any HDFS compatible store requiring special configuration.
- Changed the API of SparkContext.setCheckpointDir() - eliminated the unnecessary 'useExisting' parameter. Now SparkContext will always create a unique subdirectory within the user specified checkpoint directory. This is to ensure that previous checkpoint files are not accidentally overwritten.
- Fixed bug where setting checkpoint directory as a relative local path caused the checkpointing to fail.


Project: http://git-wip-us.apache.org/repos/asf/incubator-spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-spark/commit/55b7e2fd
Tree: http://git-wip-us.apache.org/repos/asf/incubator-spark/tree/55b7e2fd
Diff: http://git-wip-us.apache.org/repos/asf/incubator-spark/diff/55b7e2fd

Branch: refs/heads/master
Commit: 55b7e2fdffc6c3537da69152a3d02d5be599fa1b
Parents: 50e3b8e fcd17a1
Author: Patrick Wendell <pw...@gmail.com>
Authored: Tue Dec 31 10:12:51 2013 -0800
Committer: Patrick Wendell <pw...@gmail.com>
Committed: Tue Dec 31 10:12:51 2013 -0800

----------------------------------------------------------------------
 .../scala/org/apache/spark/SparkContext.scala   |  23 +--
 .../spark/api/java/JavaSparkContext.scala       |  15 +-
 .../org/apache/spark/rdd/CheckpointRDD.scala    |  32 ++--
 .../apache/spark/rdd/RDDCheckpointData.scala    |  15 +-
 .../scala/org/apache/spark/JavaAPISuite.java    |   4 +-
 python/pyspark/context.py                       |   9 +-
 python/pyspark/tests.py                         |   4 +-
 .../org/apache/spark/streaming/Checkpoint.scala |  52 ++++---
 .../spark/streaming/StreamingContext.scala      |  32 ++--
 .../api/java/JavaStreamingContext.scala         |  13 +-
 .../streaming/dstream/FileInputDStream.scala    | 153 +++++++++++--------
 .../streaming/scheduler/JobGenerator.scala      |  67 +++++---
 .../spark/streaming/CheckpointSuite.scala       |  44 +++---
 .../spark/streaming/InputStreamsSuite.scala     |   2 +-
 14 files changed, 269 insertions(+), 196 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/55b7e2fd/core/src/main/scala/org/apache/spark/SparkContext.scala
----------------------------------------------------------------------