You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Philip Ogren <ph...@oracle.com> on 2014/01/02 18:22:36 UTC

rdd.saveAsTextFile problem

I have a very simple Spark application that looks like the following:


var myRdd: RDD[Array[String]] = initMyRdd()
println(myRdd.first.mkString(", "))
println(myRdd.count)

myRdd.saveAsTextFile("hdfs://myserver:8020/mydir")
myRdd.saveAsTextFile("target/mydir/")


The println statements work as expected.  The first saveAsTextFile 
statement also works as expected.  The second saveAsTextFile statement 
does not (even if the first is commented out.)  I get the exception 
pasted below.  If I inspect "target/mydir" I see that there is a 
directory called 
_temporary/0/_temporary/attempt_201401020953_0000_m_000000_1 which 
contains an empty part-00000 file.  It's curious because this code 
worked before with Spark 0.8.0 and now I am running on Spark 0.8.1. I 
happen to be running this on Windows in "local" mode at the moment.  
Perhaps I should try running it on my linux box.

Thanks,
Philip


Exception in thread "main" org.apache.spark.SparkException: Job aborted: 
Task 2.0:0 failed more than 0 times; aborting job 
java.lang.NullPointerException
     at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)
     at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)
     at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
     at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)
     at 
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)
     at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
     at 
org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)

Re: rdd.saveAsTextFile problem

Posted by Philip Ogren <ph...@oracle.com>.

Not really.  In practice I write everything out to HDFS and that is 
working fine.  But I write lots of unit tests and example scripts and it 
is convenient to be able to test a Spark application (or sequence of 
spark functions) in a very local way such that it doesn't depend on any 
outside infrastructure (e.g. an HDFS server.) So, it is convenient to 
write out a small amount of data locally and manually inspect the 
results - esp. as I'm building up a unit or regression test.

So, ultimately writing results out to a local file isn't that important 
to me.  However, I was just trying to run a simple example script that 
worked before and is now not working.

Thanks,
Philip

On 1/2/2014 10:28 AM, Andrew Ash wrote:
> You want to write it to a local file on the machine?  Try using 
> "file:///path/to/target/mydir/" instead
>
> I'm not sure what behavior would be if you did this on a multi-machine 
> cluster though -- you may get a bit of data on each machine in that 
> local directory.
>
>
> On Thu, Jan 2, 2014 at 12:22 PM, Philip Ogren <philip.ogren@oracle.com 
> <ma...@oracle.com>> wrote:
>
>     I have a very simple Spark application that looks like the following:
>
>
>     var myRdd: RDD[Array[String]] = initMyRdd()
>     println(myRdd.first.mkString(", "))
>     println(myRdd.count)
>
>     myRdd.saveAsTextFile("hdfs://myserver:8020/mydir")
>     myRdd.saveAsTextFile("target/mydir/")
>
>
>     The println statements work as expected.  The first saveAsTextFile
>     statement also works as expected.  The second saveAsTextFile
>     statement does not (even if the first is commented out.)  I get
>     the exception pasted below.  If I inspect "target/mydir" I see
>     that there is a directory called
>     _temporary/0/_temporary/attempt_201401020953_0000_m_000000_1 which
>     contains an empty part-00000 file.  It's curious because this code
>     worked before with Spark 0.8.0 and now I am running on Spark
>     0.8.1. I happen to be running this on Windows in "local" mode at
>     the moment.  Perhaps I should try running it on my linux box.
>
>     Thanks,
>     Philip
>
>
>     Exception in thread "main" org.apache.spark.SparkException: Job
>     aborted: Task 2.0:0 failed more than 0 times; aborting job
>     java.lang.NullPointerException
>         at
>     org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)
>         at
>     org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)
>         at
>     scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
>         at
>     scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at
>     org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)
>         at
>     org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)
>         at org.apache.spark.scheduler.DAGScheduler.org
>     <http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
>         at
>     org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)
>
>
>

Re: rdd.saveAsTextFile problem

Posted by Tathagata Das <ta...@gmail.com>.

Can you give us the more detailed exception + stack trace in the log? It
should be in the driver log. If not, please take a look at the executor
logs, through the web ui to find the stack trace.

TD


On Tue, Mar 25, 2014 at 10:43 PM, gaganbm <ga...@gmail.com> wrote:

> Hi Folks,
>
> Is this issue resolved ? If yes, could you please throw some light on how
> to
> fix this ?
>
> I am facing the same problem during writing to text files.
>
> When I do
>
> stream.foreachRDD(rdd =>{
>                                 rdd.saveAsTextFile(<"Some path">)
>                         })
>
> This works fine for me. But it creates multiple text files for each
> partition within an RDD.
>
> So I tried with coalesce option to merge my results in a single file for
> each RDD as :
>
> stream.foreachRDD(rdd =>{
>                                 rdd.coalesce(1,
> true).saveAsTextFile(<"Some path">)
>                         })
>
> This fails with :
> org.apache.spark.SparkException: Job aborted: Task 75.0:0 failed 1 times
> (most recent failure: Exception failure: java.lang.IllegalStateException:
> unread block data)
>
> I am using Spark Streaming 0.9.0
>
> Any clue what's going wrong when using coalesce ?
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/rdd-saveAsTextFile-problem-tp176p3238.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: rdd.saveAsTextFile problem

Posted by gaganbm <ga...@gmail.com>.

Hi Folks,

Is this issue resolved ? If yes, could you please throw some light on how to
fix this ?

I am facing the same problem during writing to text files.

When I do 

stream.foreachRDD(rdd =>{
				rdd.saveAsTextFile(<"Some path">)
			})

This works fine for me. But it creates multiple text files for each
partition within an RDD.

So I tried with coalesce option to merge my results in a single file for
each RDD as :

stream.foreachRDD(rdd =>{
				rdd.coalesce(1, true).saveAsTextFile(<"Some path">)
			})

This fails with :
org.apache.spark.SparkException: Job aborted: Task 75.0:0 failed 1 times
(most recent failure: Exception failure: java.lang.IllegalStateException:
unread block data)

I am using Spark Streaming 0.9.0

Any clue what's going wrong when using coalesce ?





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/rdd-saveAsTextFile-problem-tp176p3238.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: rdd.saveAsTextFile problem

Posted by Andrew Ash <an...@andrewash.com>.

I'm guessing it's a documentation issue, but certainly something could have
broken.

- what version of Spark?  -- 0.8.1
- what mode are you running with? (local, standalone, mesos, YARN) -- local
on Windows
- are you using the shell or a application - shell?
- what language (scala / java / Python) - scala

Can you provide a deeper error stacktrace from the executor?  Look in the
webui (port 4040) and in the stdout/stderr files.

Also, give it a shot on the linux box to see if that works.

Cheers!
Andrew


On Thu, Jan 2, 2014 at 1:31 PM, Philip Ogren <ph...@oracle.com>wrote:

>  Yep - that works great and is what I normally do.
>
> I perhaps should have framed my email as a bug report.  The documentation
> for saveAsTextFile says you can write results out to a local file but it
> doesn't work for me per the described behavior.  It also worked before and
> now it doesn't.  So, it seems like a bug.  Should I file a Jira issue?  I
> haven't done that yet for this project but would be happy to.
>
> Thanks,
> Philip
>
>
> On 1/2/2014 11:23 AM, Andrew Ash wrote:
>
> For testing, maybe try using .collect and doing the comparison between
> expected and actual in memory rather than on disk?
>
>
> On Thu, Jan 2, 2014 at 12:54 PM, Philip Ogren <ph...@oracle.com>wrote:
>
>>  I just tried your suggestion and get the same results with the
>> _temporary directory.  Thanks though.
>>
>>
>> On 1/2/2014 10:28 AM, Andrew Ash wrote:
>>
>> You want to write it to a local file on the machine?  Try using
>> "file:///path/to/target/mydir/" instead
>>
>>  I'm not sure what behavior would be if you did this on a multi-machine
>> cluster though -- you may get a bit of data on each machine in that local
>> directory.
>>
>>
>> On Thu, Jan 2, 2014 at 12:22 PM, Philip Ogren <ph...@oracle.com>wrote:
>>
>>> I have a very simple Spark application that looks like the following:
>>>
>>>
>>> var myRdd: RDD[Array[String]] = initMyRdd()
>>> println(myRdd.first.mkString(", "))
>>> println(myRdd.count)
>>>
>>> myRdd.saveAsTextFile("hdfs://myserver:8020/mydir")
>>> myRdd.saveAsTextFile("target/mydir/")
>>>
>>>
>>> The println statements work as expected.  The first saveAsTextFile
>>> statement also works as expected.  The second saveAsTextFile statement does
>>> not (even if the first is commented out.)  I get the exception pasted
>>> below.  If I inspect "target/mydir" I see that there is a directory called
>>> _temporary/0/_temporary/attempt_201401020953_0000_m_000000_1 which contains
>>> an empty part-00000 file.  It's curious because this code worked before
>>> with Spark 0.8.0 and now I am running on Spark 0.8.1. I happen to be
>>> running this on Windows in "local" mode at the moment.  Perhaps I should
>>> try running it on my linux box.
>>>
>>> Thanks,
>>> Philip
>>>
>>>
>>> Exception in thread "main" org.apache.spark.SparkException: Job aborted:
>>> Task 2.0:0 failed more than 0 times; aborting job
>>> java.lang.NullPointerException
>>>     at
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)
>>>     at
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)
>>>     at
>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
>>>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>     at
>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)
>>>     at
>>> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)
>>>     at org.apache.spark.scheduler.DAGScheduler.org
>>> $apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
>>>     at
>>> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)
>>>
>>>
>>>
>>
>>
>
>

Re: rdd.saveAsTextFile problem

Posted by Philip Ogren <ph...@oracle.com>.

Yep - that works great and is what I normally do.

I perhaps should have framed my email as a bug report.  The 
documentation for saveAsTextFile says you can write results out to a 
local file but it doesn't work for me per the described behavior. It 
also worked before and now it doesn't.  So, it seems like a bug. Should 
I file a Jira issue?  I haven't done that yet for this project but would 
be happy to.

Thanks,
Philip

On 1/2/2014 11:23 AM, Andrew Ash wrote:
> For testing, maybe try using .collect and doing the comparison between 
> expected and actual in memory rather than on disk?
>
>
> On Thu, Jan 2, 2014 at 12:54 PM, Philip Ogren <philip.ogren@oracle.com 
> <ma...@oracle.com>> wrote:
>
>     I just tried your suggestion and get the same results with the
>     _temporary directory.  Thanks though.
>
>
>     On 1/2/2014 10:28 AM, Andrew Ash wrote:
>>     You want to write it to a local file on the machine?  Try using
>>     "file:///path/to/target/mydir/" instead
>>
>>     I'm not sure what behavior would be if you did this on a
>>     multi-machine cluster though -- you may get a bit of data on each
>>     machine in that local directory.
>>
>>
>>     On Thu, Jan 2, 2014 at 12:22 PM, Philip Ogren
>>     <philip.ogren@oracle.com <ma...@oracle.com>> wrote:
>>
>>         I have a very simple Spark application that looks like the
>>         following:
>>
>>
>>         var myRdd: RDD[Array[String]] = initMyRdd()
>>         println(myRdd.first.mkString(", "))
>>         println(myRdd.count)
>>
>>         myRdd.saveAsTextFile("hdfs://myserver:8020/mydir")
>>         myRdd.saveAsTextFile("target/mydir/")
>>
>>
>>         The println statements work as expected.  The first
>>         saveAsTextFile statement also works as expected.  The second
>>         saveAsTextFile statement does not (even if the first is
>>         commented out.)  I get the exception pasted below.  If I
>>         inspect "target/mydir" I see that there is a directory called
>>         _temporary/0/_temporary/attempt_201401020953_0000_m_000000_1
>>         which contains an empty part-00000 file.  It's curious
>>         because this code worked before with Spark 0.8.0 and now I am
>>         running on Spark 0.8.1. I happen to be running this on
>>         Windows in "local" mode at the moment.  Perhaps I should try
>>         running it on my linux box.
>>
>>         Thanks,
>>         Philip
>>
>>
>>         Exception in thread "main" org.apache.spark.SparkException:
>>         Job aborted: Task 2.0:0 failed more than 0 times; aborting
>>         job java.lang.NullPointerException
>>             at
>>         org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)
>>             at
>>         org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)
>>             at
>>         scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
>>             at
>>         scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>             at
>>         org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)
>>             at
>>         org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)
>>             at org.apache.spark.scheduler.DAGScheduler.org
>>         <http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
>>             at
>>         org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)
>>
>>
>>
>
>

Re: rdd.saveAsTextFile problem

Posted by Andrew Ash <an...@andrewash.com>.

For testing, maybe try using .collect and doing the comparison between
expected and actual in memory rather than on disk?


On Thu, Jan 2, 2014 at 12:54 PM, Philip Ogren <ph...@oracle.com>wrote:

>  I just tried your suggestion and get the same results with the _temporary
> directory.  Thanks though.
>
>
> On 1/2/2014 10:28 AM, Andrew Ash wrote:
>
> You want to write it to a local file on the machine?  Try using
> "file:///path/to/target/mydir/" instead
>
>  I'm not sure what behavior would be if you did this on a multi-machine
> cluster though -- you may get a bit of data on each machine in that local
> directory.
>
>
> On Thu, Jan 2, 2014 at 12:22 PM, Philip Ogren <ph...@oracle.com>wrote:
>
>> I have a very simple Spark application that looks like the following:
>>
>>
>> var myRdd: RDD[Array[String]] = initMyRdd()
>> println(myRdd.first.mkString(", "))
>> println(myRdd.count)
>>
>> myRdd.saveAsTextFile("hdfs://myserver:8020/mydir")
>> myRdd.saveAsTextFile("target/mydir/")
>>
>>
>> The println statements work as expected.  The first saveAsTextFile
>> statement also works as expected.  The second saveAsTextFile statement does
>> not (even if the first is commented out.)  I get the exception pasted
>> below.  If I inspect "target/mydir" I see that there is a directory called
>> _temporary/0/_temporary/attempt_201401020953_0000_m_000000_1 which contains
>> an empty part-00000 file.  It's curious because this code worked before
>> with Spark 0.8.0 and now I am running on Spark 0.8.1. I happen to be
>> running this on Windows in "local" mode at the moment.  Perhaps I should
>> try running it on my linux box.
>>
>> Thanks,
>> Philip
>>
>>
>> Exception in thread "main" org.apache.spark.SparkException: Job aborted:
>> Task 2.0:0 failed more than 0 times; aborting job
>> java.lang.NullPointerException
>>     at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)
>>     at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)
>>     at
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
>>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>     at
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)
>>     at
>> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)
>>     at org.apache.spark.scheduler.DAGScheduler.org
>> $apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
>>     at
>> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)
>>
>>
>>
>
>

Re: rdd.saveAsTextFile problem

Posted by Philip Ogren <ph...@oracle.com>.

I just tried your suggestion and get the same results with the 
_temporary directory.  Thanks though.

On 1/2/2014 10:28 AM, Andrew Ash wrote:
> You want to write it to a local file on the machine?  Try using 
> "file:///path/to/target/mydir/" instead
>
> I'm not sure what behavior would be if you did this on a multi-machine 
> cluster though -- you may get a bit of data on each machine in that 
> local directory.
>
>
> On Thu, Jan 2, 2014 at 12:22 PM, Philip Ogren <philip.ogren@oracle.com 
> <ma...@oracle.com>> wrote:
>
>     I have a very simple Spark application that looks like the following:
>
>
>     var myRdd: RDD[Array[String]] = initMyRdd()
>     println(myRdd.first.mkString(", "))
>     println(myRdd.count)
>
>     myRdd.saveAsTextFile("hdfs://myserver:8020/mydir")
>     myRdd.saveAsTextFile("target/mydir/")
>
>
>     The println statements work as expected.  The first saveAsTextFile
>     statement also works as expected.  The second saveAsTextFile
>     statement does not (even if the first is commented out.)  I get
>     the exception pasted below.  If I inspect "target/mydir" I see
>     that there is a directory called
>     _temporary/0/_temporary/attempt_201401020953_0000_m_000000_1 which
>     contains an empty part-00000 file.  It's curious because this code
>     worked before with Spark 0.8.0 and now I am running on Spark
>     0.8.1. I happen to be running this on Windows in "local" mode at
>     the moment.  Perhaps I should try running it on my linux box.
>
>     Thanks,
>     Philip
>
>
>     Exception in thread "main" org.apache.spark.SparkException: Job
>     aborted: Task 2.0:0 failed more than 0 times; aborting job
>     java.lang.NullPointerException
>         at
>     org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)
>         at
>     org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)
>         at
>     scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
>         at
>     scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at
>     org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)
>         at
>     org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)
>         at org.apache.spark.scheduler.DAGScheduler.org
>     <http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
>         at
>     org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)
>
>
>

Re: rdd.saveAsTextFile problem

Posted by Andrew Ash <an...@andrewash.com>.

You want to write it to a local file on the machine?  Try using
"file:///path/to/target/mydir/" instead

I'm not sure what behavior would be if you did this on a multi-machine
cluster though -- you may get a bit of data on each machine in that local
directory.


On Thu, Jan 2, 2014 at 12:22 PM, Philip Ogren <ph...@oracle.com>wrote:

> I have a very simple Spark application that looks like the following:
>
>
> var myRdd: RDD[Array[String]] = initMyRdd()
> println(myRdd.first.mkString(", "))
> println(myRdd.count)
>
> myRdd.saveAsTextFile("hdfs://myserver:8020/mydir")
> myRdd.saveAsTextFile("target/mydir/")
>
>
> The println statements work as expected.  The first saveAsTextFile
> statement also works as expected.  The second saveAsTextFile statement does
> not (even if the first is commented out.)  I get the exception pasted
> below.  If I inspect "target/mydir" I see that there is a directory called
> _temporary/0/_temporary/attempt_201401020953_0000_m_000000_1 which
> contains an empty part-00000 file.  It's curious because this code worked
> before with Spark 0.8.0 and now I am running on Spark 0.8.1. I happen to be
> running this on Windows in "local" mode at the moment.  Perhaps I should
> try running it on my linux box.
>
> Thanks,
> Philip
>
>
> Exception in thread "main" org.apache.spark.SparkException: Job aborted:
> Task 2.0:0 failed more than 0 times; aborting job
> java.lang.NullPointerException
>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> abortStage$1.apply(DAGScheduler.scala:827)
>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> abortStage$1.apply(DAGScheduler.scala:825)
>     at scala.collection.mutable.ResizableArray$class.foreach(
> ResizableArray.scala:60)
>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>     at org.apache.spark.scheduler.DAGScheduler.abortStage(
> DAGScheduler.scala:825)
>     at org.apache.spark.scheduler.DAGScheduler.processEvent(
> DAGScheduler.scala:440)
>     at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
> scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
>     at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(
> DAGScheduler.scala:157)
>
>
>

Re: rdd.saveAsTextFile problem

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

On Thu, May 21, 2015 at 4:17 PM, Howard Yang <ho...@gmail.com>
wrote:

> follow
> http://www.srccodes.com/p/article/38/build-install-configure-run-apache-hadoop-2.2.0-microsoft-windows-os
> to build latest version Hadoop in my windows machine,
> and Add Environment Variable *HADOOP_HOME* and edit *Path* Variable to
> add *bin* directory of *HADOOP_HOME* (say*C:\hadoop\bin*).
> fix this issue in my env
>
> 2015-05-21 9:55 GMT+03:00 Akhil Das <ak...@sigmoidanalytics.com>:
>
>> This thread happened a year back, can you please share what issue you are
>> facing? which version of spark you are using? What is your system
>> environment? Exception stack-trace?
>>
>> Thanks
>> Best Regards
>>
>> On Thu, May 21, 2015 at 12:19 PM, Keerthi <ke...@gmail.com>
>> wrote:
>>
>>> Hi ,
>>>
>>> I had tried the workaround shared here, but still facing the same
>>> issue...
>>>
>>> Thanks.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/rdd-saveAsTextFile-problem-tp176p22970.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>

Re: rdd.saveAsTextFile problem

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

This thread happened a year back, can you please share what issue you are
facing? which version of spark you are using? What is your system
environment? Exception stack-trace?

Thanks
Best Regards

On Thu, May 21, 2015 at 12:19 PM, Keerthi <ke...@gmail.com>
wrote:

> Hi ,
>
> I had tried the workaround shared here, but still facing the same issue...
>
> Thanks.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/rdd-saveAsTextFile-problem-tp176p22970.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: rdd.saveAsTextFile problem

Posted by Keerthi <ke...@gmail.com>.

Hi ,

I had tried the workaround shared here, but still facing the same issue...

Thanks.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/rdd-saveAsTextFile-problem-tp176p22970.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: rdd.saveAsTextFile problem

Posted by dylanhogg <dy...@gmail.com>.

Try the workaround for Windows found here:
http://qnalist.com/questions/4994960/run-spark-unit-test-on-windows-7.

This fix the issue when calling rdd.saveAsTextFile(..) for me with Spark
v1.1.0 on windows 8.1 in local mode.

Summary of steps:

1) download compiled winutils.exe from
http://social.msdn.microsoft.com/Forums/windowsazure/en-US/28a57efb-082b-424b-8d9e-731b1fe135de/please-read-if-experiencing-job-failures?forum=hdinsight

2) put this file into d:\winutil\bin

3) add in code: System.setProperty("hadoop.home.dir", "d:\\winutil\\")



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/rdd-saveAsTextFile-problem-tp176p20546.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org