You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by sryza <gi...@git.apache.org> on 2014/08/15 02:04:49 UTC

[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...

GitHub user sryza opened a pull request:

    https://github.com/apache/spark/pull/1956

    SPARK-3052. Misleading and spurious FileSystem closed errors whenever a ...

    ...job fails while reading from Hadoop

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sryza/spark sandy-spark-3052

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1956.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1956
    
----
commit 073da3774e3c8378f05df0fd046c6295bb0982c5
Author: Sandy Ryza <sa...@cloudera.com>
Date:   2014-08-15T00:04:01Z

    SPARK-3052. Misleading and spurious FileSystem closed errors whenever a job fails while reading from Hadoop

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/1956#issuecomment-54114366
  
    Here's the exception:
    
    > java.io.IOException: Filesystem closed
            at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:703)
            at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:775)
            at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:836)
            at java.io.DataInputStream.readFully(DataInputStream.java:195)
            at java.io.DataInputStream.readFully(DataInputStream.java:169)
            at parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:610)
            at parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:360)
            at parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:100)
            at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172)
            at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130)
            at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:122)
            at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
            at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
            at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
            at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
            at scala.collection.Iterator$class.foreach(Iterator.scala:727)
            at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
            at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
            at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
            at org.apache.spark.scheduler.Task.run(Task.scala:51)
            at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:745)
    
    Thanks, I wasn't aware of Utils.inShutdown.  I'll post a patch that uses that.  I haven't yet figured out how to reliably reproduce this, so I can't verify that it will safeguard against the warning in all situations where it should, but it seems like an improvement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1956#issuecomment-52261089
  
    QA tests have started for PR 1956. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18582/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1956#issuecomment-54118325
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19581/consoleFull) for   PR 1956 at commit [`815813a`](https://github.com/apache/spark/commit/815813ab9bd71360b64ae45f290ff77b916a01d9).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class ByteArrayChunkOutputStream(chunkSize: Int) extends OutputStream `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1956


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/1956#issuecomment-52267702
  
    > Ah and the order they should be shut down in is RecordReader then
    FileSystem?
    
    Right


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1956#issuecomment-54114804
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19581/consoleFull) for   PR 1956 at commit [`815813a`](https://github.com/apache/spark/commit/815813ab9bd71360b64ae45f290ff77b916a01d9).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...

Posted by ash211 <gi...@git.apache.org>.

Github user ash211 commented on the pull request:

    https://github.com/apache/spark/pull/1956#issuecomment-52260943
  
    Lowering the log level hides it, but what's the cause of these issues?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1956#issuecomment-54197136
  
    Thanks Sandy, merged this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1956#issuecomment-52264127
  
    QA results for PR 1956:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18582/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...

Posted by ash211 <gi...@git.apache.org>.

Github user ash211 commented on the pull request:

    https://github.com/apache/spark/pull/1956#issuecomment-52262922
  
    Ah and the order they should be shut down in is RecordReader then
    FileSystem?
    
    Thanks for catching this -- I've seen it myself and was wondering why the
    job output seemed to be correct
    On Aug 14, 2014 8:28 PM, "Sandy Ryza" <no...@github.com> wrote:
    
    > This occurs when an executor process shuts down while tasks are executing
    > (e.g. because the driver disassociated or an OOME).
    >
    > Hadoop FileSystems register a shutdown hook to close themselves.
    > RecordReaders get closed in a finally block after the tasks that they're
    > used in.
    >
    > So there's a race between these two and I can't think of a good way to
    > make one execute after the other. I'm a little confused as to why the
    > HadoopRDD finally block is running at all. Some googling seems to indicate
    > that finally blocks don't run during a System.exit(). And I would think a
    > ShutdownHook would run after that happens anyway. So I can't claim to have
    > 100% understanding of what's going on here. Spark isn't closing the
    > FileSystem on its own.
    >
    > More generally, think logging a warning is overkill on a reader close
    > error.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/1956#issuecomment-52262225>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1956#issuecomment-53635922
  
    @sryza what's the stack trace printed here? I think it would be better to check whether we're shutting down (with Utils.inShutdown) and log a warning if we're not shutting down. A failed close() seems bad in other situations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/1956#issuecomment-54174772
  
    I believe the failure is unrelated.  I noticed it on SPARK-2461 as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...

Posted by sryza <gi...@git.apache.org>.

Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/1956#issuecomment-52262225
  
    This occurs when an executor process shuts down while tasks are executing (e.g. because the driver disassociated or an OOME).
    
    Hadoop FileSystems register a shutdown hook to close themselves.  RecordReaders get closed in a finally block after the tasks that they're used in.
    
    So there's a race between these two and I can't think of a good way to make one execute after the other.  I'm a little confused as to why the HadoopRDD finally block is running at all.  Some googling seems to indicate that finally blocks don't run during a System.exit().  And I would think a ShutdownHook would run after that happens anyway.  So I can't claim to have 100% understanding of what's going on here.  Spark isn't closing the FileSystem on its own.
    
    More generally, think logging a warning is overkill on a reader close error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org