You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Steven Cox <sc...@renci.org> on 2014/07/04 03:45:11 UTC

No FileSystem for scheme: hdfs

...and a real subject line.
________________________________
From: Steven Cox [scox@renci.org]
Sent: Thursday, July 03, 2014 9:21 PM
To: user@spark.apache.org
Subject:


Folks, I have a program derived from the Kafka streaming wordcount example which works fine standalone.


Running on Mesos is not working so well. For starters, I get the error below "No FileSystem for scheme: hdfs".


I've looked at lots of promising comments on this issue so now I have -

* Every jar under hadoop in my classpath

* Hadoop HDFS and Client in my pom.xml


I find it odd that the app writes checkpoint files to HDFS successfully for a couple of cycles then throws this exception. This would suggest the problem is not with the syntax of the hdfs URL, for example.


Any thoughts on what I'm missing?


Thanks,


Steve


Mesos : 0.18.2

Spark : 0.9.1



14/07/03 21:14:20 WARN TaskSetManager: Lost TID 296 (task 1514.0:0)

14/07/03 21:14:20 WARN TaskSetManager: Lost TID 297 (task 1514.0:1)

14/07/03 21:14:20 WARN TaskSetManager: Lost TID 298 (task 1514.0:0)

14/07/03 21:14:20 ERROR TaskSetManager: Task 1514.0:0 failed 10 times; aborting job

14/07/03 21:14:20 ERROR JobScheduler: Error running job streaming job 1404436460000 ms.0

org.apache.spark.SparkException: Job aborted: Task 1514.0:0 failed 10 times (most recent failure: Exception failure: java.io.IOException: No FileSystem for scheme: hdfs)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)

        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)

        at scala.Option.foreach(Option.scala:236)

        at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)

        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)

        at akka.actor.ActorCell.invoke(ActorCell.scala:456)

        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)

Re: No FileSystem for scheme: hdfs

Posted by Juan Rodríguez Hortalá <ju...@gmail.com>.

Hi,

To cope with the issue with META-INF that Sean is pointing out, my solution
is replacing maven-assembly.plugin with maven-shade-plugin, using the
ServicesResourceTransformer (
http://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ServicesResourceTransformer)
"to merge multiple implementations of the same interface into one service
entry"

Hope that helps,

Greetings



2014-07-04 9:50 GMT+02:00 Sean Owen <so...@cloudera.com>:

> "No file system for scheme", in the past for me, has meant that files
> in META-INF/services have collided when building an uber jar. There's
> a sort-of-obscure mechanism in Java for registering implementations of
> a service's interface, and Hadoop uses it for FileSystem. It consists
> of listing classes in a file in META-INF/services. If two jars have a
> copy and they collide and one overwrites the other -- or you miss
> packaging these files -- you can end up with this error. Ring any
> bells?
>
> On Fri, Jul 4, 2014 at 2:45 AM, Steven Cox <sc...@renci.org> wrote:
> > ...and a real subject line.
> > ________________________________
> > From: Steven Cox [scox@renci.org]
> > Sent: Thursday, July 03, 2014 9:21 PM
> > To: user@spark.apache.org
> > Subject:
> >
> > Folks, I have a program derived from the Kafka streaming wordcount
> example
> > which works fine standalone.
> >
> >
> > Running on Mesos is not working so well. For starters, I get the error
> below
> > "No FileSystem for scheme: hdfs".
> >
> >
> > I've looked at lots of promising comments on this issue so now I have -
> >
> > * Every jar under hadoop in my classpath
> >
> > * Hadoop HDFS and Client in my pom.xml
> >
> >
> > I find it odd that the app writes checkpoint files to HDFS successfully
> for
> > a couple of cycles then throws this exception. This would suggest the
> > problem is not with the syntax of the hdfs URL, for example.
> >
> >
> > Any thoughts on what I'm missing?
> >
> >
> > Thanks,
> >
> >
> > Steve
> >
> >
> > Mesos : 0.18.2
> >
> > Spark : 0.9.1
> >
> >
> >
> > 14/07/03 21:14:20 WARN TaskSetManager: Lost TID 296 (task 1514.0:0)
> >
> > 14/07/03 21:14:20 WARN TaskSetManager: Lost TID 297 (task 1514.0:1)
> >
> > 14/07/03 21:14:20 WARN TaskSetManager: Lost TID 298 (task 1514.0:0)
> >
> > 14/07/03 21:14:20 ERROR TaskSetManager: Task 1514.0:0 failed 10 times;
> > aborting job
> >
> > 14/07/03 21:14:20 ERROR JobScheduler: Error running job streaming job
> > 1404436460000 ms.0
> >
> > org.apache.spark.SparkException: Job aborted: Task 1514.0:0 failed 10
> times
> > (most recent failure: Exception failure: java.io.IOException: No
> FileSystem
> > for scheme: hdfs)
> >
> >         at
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
> >
> >         at
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
> >
> >         at
> >
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> >
> >         at
> > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> >
> >         at
> > org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
> >
> >         at
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> >
> >         at
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> >
> >         at scala.Option.foreach(Option.scala:236)
> >
> >         at
> >
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
> >
> >         at
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
> >
> >         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> >
> >         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> >
> >         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> >
> >
>

Re: No FileSystem for scheme: hdfs

Posted by Sean Owen <so...@cloudera.com>.

"No file system for scheme", in the past for me, has meant that files
in META-INF/services have collided when building an uber jar. There's
a sort-of-obscure mechanism in Java for registering implementations of
a service's interface, and Hadoop uses it for FileSystem. It consists
of listing classes in a file in META-INF/services. If two jars have a
copy and they collide and one overwrites the other -- or you miss
packaging these files -- you can end up with this error. Ring any
bells?

On Fri, Jul 4, 2014 at 2:45 AM, Steven Cox <sc...@renci.org> wrote:
> ...and a real subject line.
> ________________________________
> From: Steven Cox [scox@renci.org]
> Sent: Thursday, July 03, 2014 9:21 PM
> To: user@spark.apache.org
> Subject:
>
> Folks, I have a program derived from the Kafka streaming wordcount example
> which works fine standalone.
>
>
> Running on Mesos is not working so well. For starters, I get the error below
> "No FileSystem for scheme: hdfs".
>
>
> I've looked at lots of promising comments on this issue so now I have -
>
> * Every jar under hadoop in my classpath
>
> * Hadoop HDFS and Client in my pom.xml
>
>
> I find it odd that the app writes checkpoint files to HDFS successfully for
> a couple of cycles then throws this exception. This would suggest the
> problem is not with the syntax of the hdfs URL, for example.
>
>
> Any thoughts on what I'm missing?
>
>
> Thanks,
>
>
> Steve
>
>
> Mesos : 0.18.2
>
> Spark : 0.9.1
>
>
>
> 14/07/03 21:14:20 WARN TaskSetManager: Lost TID 296 (task 1514.0:0)
>
> 14/07/03 21:14:20 WARN TaskSetManager: Lost TID 297 (task 1514.0:1)
>
> 14/07/03 21:14:20 WARN TaskSetManager: Lost TID 298 (task 1514.0:0)
>
> 14/07/03 21:14:20 ERROR TaskSetManager: Task 1514.0:0 failed 10 times;
> aborting job
>
> 14/07/03 21:14:20 ERROR JobScheduler: Error running job streaming job
> 1404436460000 ms.0
>
> org.apache.spark.SparkException: Job aborted: Task 1514.0:0 failed 10 times
> (most recent failure: Exception failure: java.io.IOException: No FileSystem
> for scheme: hdfs)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
>
>         at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
>         at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
>
>         at scala.Option.foreach(Option.scala:236)
>
>         at
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
>
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>
>

RE: No FileSystem for scheme: hdfs

Posted by Steven Cox <sc...@renci.org>.

Thanks for the help folks.

Adding the config files was necessary but not sufficient.

I also had hadoop 1.0.4 classes on the classpath because a bad jar:

   spark-0.9.1/jars/spark-assembly-0.9.1-hadoop1.0.4.jar

was in my spark executor tar.gz (stored in HDFS).

I believe this was due to a bit of unfortunate devops hygiene during the install of our new cluster.

After ensuring the pom referenced hadoop 2.3.0 and rebuilding with:

   mvn -Pyarn -Dhadoop.version=2.3.0 -Dyarn.version=2.3.0 -DskipTests clean package

I repackaged, chucked it into hdfs and relaunched my app.

Problem solved.

Hopefully, this will save someone else some tedium.

Thanks,

Steve


________________________________
From: Akhil Das [akhil@sigmoidanalytics.com]
Sent: Friday, July 04, 2014 1:55 AM
To: user@spark.apache.org
Subject: Re: No FileSystem for scheme: hdfs

Most likely you are missing the hadoop configuration files (present in conf/*.xml).

Thanks
Best Regards


On Fri, Jul 4, 2014 at 7:38 AM, Steven Cox <sc...@renci.org>> wrote:
They weren't. They are now and the logs look a bit better - like perhaps some serialization is completing that wasn't before.

But I still get the same error periodically. Other thoughts?

________________________________
From: Soren Macbeth [soren@yieldbot.com<ma...@yieldbot.com>]
Sent: Thursday, July 03, 2014 9:54 PM
To: user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: No FileSystem for scheme: hdfs

Are the hadoop configuration files on the classpath for your mesos executors?


On Thu, Jul 3, 2014 at 6:45 PM, Steven Cox <sc...@renci.org>> wrote:
...and a real subject line.
________________________________
From: Steven Cox [scox@renci.org<ma...@renci.org>]
Sent: Thursday, July 03, 2014 9:21 PM
To: user@spark.apache.org<ma...@spark.apache.org>
Subject:


Folks, I have a program derived from the Kafka streaming wordcount example which works fine standalone.


Running on Mesos is not working so well. For starters, I get the error below "No FileSystem for scheme: hdfs".


I've looked at lots of promising comments on this issue so now I have -

* Every jar under hadoop in my classpath

* Hadoop HDFS and Client in my pom.xml


I find it odd that the app writes checkpoint files to HDFS successfully for a couple of cycles then throws this exception. This would suggest the problem is not with the syntax of the hdfs URL, for example.


Any thoughts on what I'm missing?


Thanks,


Steve


Mesos : 0.18.2

Spark : 0.9.1



14/07/03 21:14:20 WARN TaskSetManager: Lost TID 296 (task 1514.0:0)

14/07/03 21:14:20 WARN TaskSetManager: Lost TID 297 (task 1514.0:1)

14/07/03 21:14:20 WARN TaskSetManager: Lost TID 298 (task 1514.0:0)

14/07/03 21:14:20 ERROR TaskSetManager: Task 1514.0:0 failed 10 times; aborting job

14/07/03 21:14:20 ERROR JobScheduler: Error running job streaming job 1404436460000 ms.0

org.apache.spark.SparkException: Job aborted: Task 1514.0:0 failed 10 times (most recent failure: Exception failure: java.io.IOException: No FileSystem for scheme: hdfs)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)

        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

        at org.apache.spark.scheduler.DAGScheduler.org<http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)

        at scala.Option.foreach(Option.scala:236)

        at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)

        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)

        at akka.actor.ActorCell.invoke(ActorCell.scala:456)

        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)

Re: No FileSystem for scheme: hdfs

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

Most likely you are missing the hadoop configuration files (present in
conf/*.xml).

Thanks
Best Regards


On Fri, Jul 4, 2014 at 7:38 AM, Steven Cox <sc...@renci.org> wrote:

>  They weren't. They are now and the logs look a bit better - like perhaps
> some serialization is completing that wasn't before.
>
>  But I still get the same error periodically. Other thoughts?
>
>  ------------------------------
> *From:* Soren Macbeth [soren@yieldbot.com]
> *Sent:* Thursday, July 03, 2014 9:54 PM
> *To:* user@spark.apache.org
> *Subject:* Re: No FileSystem for scheme: hdfs
>
>   Are the hadoop configuration files on the classpath for your mesos
> executors?
>
>
> On Thu, Jul 3, 2014 at 6:45 PM, Steven Cox <sc...@renci.org> wrote:
>
>>  ...and a real subject line.
>>  ------------------------------
>> *From:* Steven Cox [scox@renci.org]
>> *Sent:* Thursday, July 03, 2014 9:21 PM
>> *To:* user@spark.apache.org
>> *Subject:*
>>
>>   Folks, I have a program derived from the Kafka streaming wordcount
>> example which works fine standalone.
>>
>>
>>  Running on Mesos is not working so well. For starters, I get the error
>> below "No FileSystem for scheme: hdfs".
>>
>>
>>  I've looked at lots of promising comments on this issue so now I have -
>>
>> * Every jar under hadoop in my classpath
>>
>> * Hadoop HDFS and Client in my pom.xml
>>
>>
>>  I find it odd that the app writes checkpoint files to HDFS successfully
>> for a couple of cycles then throws this exception. This would suggest the
>> problem is not with the syntax of the hdfs URL, for example.
>>
>>
>>  Any thoughts on what I'm missing?
>>
>>
>>  Thanks,
>>
>>
>>  Steve
>>
>>
>>  Mesos : 0.18.2
>>
>> Spark : 0.9.1
>>
>>
>>
>>  14/07/03 21:14:20 WARN TaskSetManager: Lost TID 296 (task 1514.0:0)
>>
>> 14/07/03 21:14:20 WARN TaskSetManager: Lost TID 297 (task 1514.0:1)
>>
>> 14/07/03 21:14:20 WARN TaskSetManager: Lost TID 298 (task 1514.0:0)
>>
>> 14/07/03 21:14:20 ERROR TaskSetManager: Task 1514.0:0 failed 10 times;
>> aborting job
>>
>> 14/07/03 21:14:20 ERROR JobScheduler: Error running job streaming job
>> 1404436460000 ms.0
>>
>> org.apache.spark.SparkException: Job aborted: Task 1514.0:0 failed 10
>> times (most recent failure: Exception failure: java.io.IOException: No
>> FileSystem for scheme: hdfs)
>>
>>         at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
>>
>>         at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
>>
>>         at
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>
>>         at
>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>
>>         at org.apache.spark.scheduler.DAGScheduler.org
>> $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
>>
>>         at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
>>
>>         at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
>>
>>         at scala.Option.foreach(Option.scala:236)
>>
>>         at
>> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
>>
>>         at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
>>
>>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>
>>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>
>>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>
>>
>>
>
>

RE: No FileSystem for scheme: hdfs

Posted by Steven Cox <sc...@renci.org>.

They weren't. They are now and the logs look a bit better - like perhaps some serialization is completing that wasn't before.

But I still get the same error periodically. Other thoughts?

________________________________
From: Soren Macbeth [soren@yieldbot.com]
Sent: Thursday, July 03, 2014 9:54 PM
To: user@spark.apache.org
Subject: Re: No FileSystem for scheme: hdfs

Are the hadoop configuration files on the classpath for your mesos executors?

On Thu, Jul 3, 2014 at 6:45 PM, Steven Cox <sc...@renci.org>> wrote:
...and a real subject line.
________________________________
From: Steven Cox [scox@renci.org<ma...@renci.org>]
Sent: Thursday, July 03, 2014 9:21 PM
To: user@spark.apache.org<ma...@spark.apache.org>
Subject:

Folks, I have a program derived from the Kafka streaming wordcount example which works fine standalone.

Running on Mesos is not working so well. For starters, I get the error below "No FileSystem for scheme: hdfs".

I've looked at lots of promising comments on this issue so now I have -

* Every jar under hadoop in my classpath

* Hadoop HDFS and Client in my pom.xml

I find it odd that the app writes checkpoint files to HDFS successfully for a couple of cycles then throws this exception. This would suggest the problem is not with the syntax of the hdfs URL, for example.

Any thoughts on what I'm missing?

Thanks,

Steve

Mesos : 0.18.2

Spark : 0.9.1

14/07/03 21:14:20 WARN TaskSetManager: Lost TID 296 (task 1514.0:0)

14/07/03 21:14:20 WARN TaskSetManager: Lost TID 297 (task 1514.0:1)

14/07/03 21:14:20 WARN TaskSetManager: Lost TID 298 (task 1514.0:0)

14/07/03 21:14:20 ERROR TaskSetManager: Task 1514.0:0 failed 10 times; aborting job

14/07/03 21:14:20 ERROR JobScheduler: Error running job streaming job 1404436460000 ms.0

org.apache.spark.SparkException: Job aborted: Task 1514.0:0 failed 10 times (most recent failure: Exception failure: java.io.IOException: No FileSystem for scheme: hdfs)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)

        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

        at org.apache.spark.scheduler.DAGScheduler.org<http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)

        at scala.Option.foreach(Option.scala:236)

        at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)

        at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)

        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)

        at akka.actor.ActorCell.invoke(ActorCell.scala:456)

        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)

Re: No FileSystem for scheme: hdfs

Posted by Soren Macbeth <so...@yieldbot.com>.

Are the hadoop configuration files on the classpath for your mesos
executors?


On Thu, Jul 3, 2014 at 6:45 PM, Steven Cox <sc...@renci.org> wrote:

>  ...and a real subject line.
>  ------------------------------
> *From:* Steven Cox [scox@renci.org]
> *Sent:* Thursday, July 03, 2014 9:21 PM
> *To:* user@spark.apache.org
> *Subject:*
>
>   Folks, I have a program derived from the Kafka streaming wordcount
> example which works fine standalone.
>
>
>  Running on Mesos is not working so well. For starters, I get the error
> below "No FileSystem for scheme: hdfs".
>
>
>  I've looked at lots of promising comments on this issue so now I have -
>
> * Every jar under hadoop in my classpath
>
> * Hadoop HDFS and Client in my pom.xml
>
>
>  I find it odd that the app writes checkpoint files to HDFS successfully
> for a couple of cycles then throws this exception. This would suggest the
> problem is not with the syntax of the hdfs URL, for example.
>
>
>  Any thoughts on what I'm missing?
>
>
>  Thanks,
>
>
>  Steve
>
>
>  Mesos : 0.18.2
>
> Spark : 0.9.1
>
>
>
>  14/07/03 21:14:20 WARN TaskSetManager: Lost TID 296 (task 1514.0:0)
>
> 14/07/03 21:14:20 WARN TaskSetManager: Lost TID 297 (task 1514.0:1)
>
> 14/07/03 21:14:20 WARN TaskSetManager: Lost TID 298 (task 1514.0:0)
>
> 14/07/03 21:14:20 ERROR TaskSetManager: Task 1514.0:0 failed 10 times;
> aborting job
>
> 14/07/03 21:14:20 ERROR JobScheduler: Error running job streaming job
> 1404436460000 ms.0
>
> org.apache.spark.SparkException: Job aborted: Task 1514.0:0 failed 10
> times (most recent failure: Exception failure: java.io.IOException: No
> FileSystem for scheme: hdfs)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
>
>         at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
>         at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
>
>         at scala.Option.foreach(Option.scala:236)
>
>         at
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
>
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
>
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>
>
>