You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sryza <gi...@git.apache.org> on 2014/08/15 02:04:49 UTC
[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/1956
SPARK-3052. Misleading and spurious FileSystem closed errors whenever a ...
...job fails while reading from Hadoop
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sryza/spark sandy-spark-3052
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1956.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1956
----
commit 073da3774e3c8378f05df0fd046c6295bb0982c5
Author: Sandy Ryza <sa...@cloudera.com>
Date: 2014-08-15T00:04:01Z
SPARK-3052. Misleading and spurious FileSystem closed errors whenever a job fails while reading from Hadoop
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...
Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1956#issuecomment-54114366
Here's the exception:
> java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:703)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:775)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:836)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:610)
at parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:360)
at parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:100)
at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172)
at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:122)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Thanks, I wasn't aware of Utils.inShutdown. I'll post a patch that uses that. I haven't yet figured out how to reliably reproduce this, so I can't verify that it will safeguard against the warning in all situations where it should, but it seems like an improvement.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1956#issuecomment-52261089
QA tests have started for PR 1956. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18582/consoleFull
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1956#issuecomment-54118325
[QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19581/consoleFull) for PR 1956 at commit [`815813a`](https://github.com/apache/spark/commit/815813ab9bd71360b64ae45f290ff77b916a01d9).
* This patch **fails** unit tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `class ByteArrayChunkOutputStream(chunkSize: Int) extends OutputStream `
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/1956
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...
Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1956#issuecomment-52267702
> Ah and the order they should be shut down in is RecordReader then
FileSystem?
Right
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1956#issuecomment-54114804
[QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19581/consoleFull) for PR 1956 at commit [`815813a`](https://github.com/apache/spark/commit/815813ab9bd71360b64ae45f290ff77b916a01d9).
* This patch merges cleanly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...
Posted by ash211 <gi...@git.apache.org>.
Github user ash211 commented on the pull request:
https://github.com/apache/spark/pull/1956#issuecomment-52260943
Lowering the log level hides it, but what's the cause of these issues?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...
Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/1956#issuecomment-54197136
Thanks Sandy, merged this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1956#issuecomment-52264127
QA results for PR 1956:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18582/consoleFull
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...
Posted by ash211 <gi...@git.apache.org>.
Github user ash211 commented on the pull request:
https://github.com/apache/spark/pull/1956#issuecomment-52262922
Ah and the order they should be shut down in is RecordReader then
FileSystem?
Thanks for catching this -- I've seen it myself and was wondering why the
job output seemed to be correct
On Aug 14, 2014 8:28 PM, "Sandy Ryza" <no...@github.com> wrote:
> This occurs when an executor process shuts down while tasks are executing
> (e.g. because the driver disassociated or an OOME).
>
> Hadoop FileSystems register a shutdown hook to close themselves.
> RecordReaders get closed in a finally block after the tasks that they're
> used in.
>
> So there's a race between these two and I can't think of a good way to
> make one execute after the other. I'm a little confused as to why the
> HadoopRDD finally block is running at all. Some googling seems to indicate
> that finally blocks don't run during a System.exit(). And I would think a
> ShutdownHook would run after that happens anyway. So I can't claim to have
> 100% understanding of what's going on here. Spark isn't closing the
> FileSystem on its own.
>
> More generally, think logging a warning is overkill on a reader close
> error.
>
> —
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/1956#issuecomment-52262225>.
>
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...
Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/1956#issuecomment-53635922
@sryza what's the stack trace printed here? I think it would be better to check whether we're shutting down (with Utils.inShutdown) and log a warning if we're not shutting down. A failed close() seems bad in other situations.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...
Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1956#issuecomment-54174772
I believe the failure is unrelated. I noticed it on SPARK-2461 as well.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...
Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1956#issuecomment-52262225
This occurs when an executor process shuts down while tasks are executing (e.g. because the driver disassociated or an OOME).
Hadoop FileSystems register a shutdown hook to close themselves. RecordReaders get closed in a finally block after the tasks that they're used in.
So there's a race between these two and I can't think of a good way to make one execute after the other. I'm a little confused as to why the HadoopRDD finally block is running at all. Some googling seems to indicate that finally blocks don't run during a System.exit(). And I would think a ShutdownHook would run after that happens anyway. So I can't claim to have 100% understanding of what's going on here. Spark isn't closing the FileSystem on its own.
More generally, think logging a warning is overkill on a reader close error.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org