You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Phil Wills <ot...@gmail.com> on 2014/09/12 18:42:32 UTC

ItemSimilarityDriver failing to write text file

I've been experimenting with the fairly new ItemSimilarityDriver, which is
working fine up until the point it tries to write out it's results.
Initially I was getting an issue with the akka frameSize being too small,
but after expanding that I'm now getting a much more cryptic error:

14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run saveAsTextFile
at TextDelimitedReaderWriter.scala:288
Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID 448
on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
reason

This is from the master node, but there doesn't seem to be anything more
intelligible in the slave node logs.

I've tried writing to the local file system as well as s3n and can see it's
not an access problem, as I am seeing a zero length file appear.

Thanks for any pointers and apologies if this would be better to ask on the
Spark list,

Phil

Re: ItemSimilarityDriver failing to write text file

Posted by Pat Ferrel <pa...@occamsmachete.com>.

This is pretty hard to grok. 

"Paul R. Brown added a comment - 09/Jun/14 16:36 - edited
As food for thought, here is the InnerClass section of the JVM spec. It looks like there have been some changes from 2.10.3 to 2.10.4 (e.g., SI-6546), but I didn't dig in.
I think the thing most likely to work is to ensure that exactly the same bits are used by all of the distributions and posted to Maven Central. (For some discussion on inner class naming stability, there was quite a bit of it on the Java 8 lambda discussion list, e.g., this message.)

I compile Mahout and Spark for the version of Hadoop I use. It sounds like they are suggesting you do that if you can’t guarantee that all artifacts were built using the same Scala. Can you get source and do the same? 

Not sure what you are suggesting below. In any case the example of how to use Mahout as a lib is ItemSimilarityDriver itself. You could dup that into your own module and invoke it any way you want but the saveAsText would still have a name mismatch with your version of Spark, right? Seems like the way to solve that is compile Spark.

On Sep 22, 2014, at 2:12 PM, Phil Wills <ot...@gmail.com> wrote:

So after getting to know Spark a bit better and some further digging, I now
believe this is down to https://issues.apache.org/jira/browse/SPARK-2075.

I thought I could work around this, by using Mahout as a library and
submitting it as a standard Spark job. Unfortunately, I can't work out how
to express a dependency on the 1.0-SNAPSHOT appropriately, at least with
SBT, which is my normal build tool. Is there an example build file for
using the snapshot version as a library?

Thanks,

Phil

On Wed, Sep 17, 2014 at 3:11 AM, Pat Ferrel <pa...@gmail.com> wrote:

> Hmm, well if that’s so then you are also able to see the data since you’re
> reading and writing to the same S3 location in either case. The only
> difference is the Spark master and therefore perhaps a Spark issue?  Not
> sure I can help much more. I don’t have access to the same setup as you
> have. Is the Spark community able to help or at least throw the ball back
> in my court?
> 
> Does the debug output indicate that the read and computation went ok? Does
> it look the same as running local? No new warnings earlier in the run? BTW
> to get local to use multiple cores run with master set to something like
> “local[4]”.
> 
> On Sep 16, 2014, at 1:22 PM, Phil Wills <ot...@gmail.com> wrote:
> 
> No, by local I mean running on one a large ec2 box spun up by the same
> script, but running the 'mahout spark-itemsimilarity' command without a
> master specified, so that it runs locally to that box, so I'm confident
> about the versions being the same in local to that box and distributed
> across the cluster modes. Apologies for the lack of clarity.
> 
> Phil
> 
> On Tue, Sep 16, 2014 at 7:48 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
> 
>> By local I assume you are talking about your dev machine, not one of the
>> cluster machines.
>> 
>> Excuse me if I’m stating the obvious but you are using two completely
>> different Spark and Hadoop installations, one local and one remote. They
>> could be completely different codebases. Just because you have configured
>> Spark and Hadoop to execute locally doesn’t mean they work remotely. It
>> sounds like you are using the CLI on your dev machine, which is set to
> run
>> locally, and passing a remote Spark master URI and S3 URI to the local
>> Mahout script. I would install and set up Mahout on your cluster master,
>> make sure MAHOUT_LOCAL is not set there since you will be using a
> cluster,
>> and execute the CLI from there.
>> 
>> Furthermore are you sure that the remote Spark cluster can see the S3
>> data? Ssh to the Spark master and do something like “hadoop fs -ls” or
>> supply the URI to verify that the Hadoop config on the remote cluster,
>> which is what the remote Spark will use, can get to the data.
>> 
>> 
>> On Sep 15, 2014, at 2:28 PM, Phil Wills <ot...@gmail.com> wrote:
>> 
>> The data and s3n file system is OK, since when I run 'locally' that's
> just
>> without a master specified, but otherwise identically, it works fine.
> I've
>> been using the spark-ec2 scripts to retrieve spark and hadoop, so had
>> assumed that meant they were operating compatible versions, but I'm not
>> specifying which hadoop to use explicitly, so I don't know if that has an
>> effect.
>> 
>> Phil
>> 
>> On Mon, Sep 15, 2014 at 7:25 PM, Pat Ferrel <pa...@occamsmachete.com>
> wrote:
>> 
>>> It should handle this input—no surprise.
>>> 
>>> Spark must be compiled for the correct version of Hadoop that you are
>>> using (Mahout also). I’d make sure Spark is working properly with your
>> HDFS
>>> by trying one of their examples if you haven’t already. Running locally
>> may
>>> not be using the same version of Hadoop, have you checked that?
>>> 
>>> A filenamePattern of ‘.*’ will get all files in
>>> s3n://recommendation-logs/2014/09/06 and you have it set to search
>>> recursively. Check to make sure this is what you want. Did you use the
>> same
>>> dir structure as you have on s3n when you ran locally? Since this driver
>>> looks at text files it can think it is working on data if it finds “[\t,
>> ]”
>>> a tab, comma, or space in the line when it’s reading garbage so you
>> should
>>> be sure it is working on only the files you want. Tell it to look for
>> only
>>> a tab if that’s what you are using or use a regex to match the entire
>>> filename like “^part.*” or “.*log”.
>>> 
>>> I have not tested with s3n:// URIs. I assume you can read all these with
>>> the hadoop tools like “hadoop fs -ls
>> s3n://recommendation-logs/2014/09/06”?
>>> 
>>> off-list I’ll send a link to epinions data formatted for Mahout. You can
>>> try putting that in HDFS via sn3 and running it because I have tested
>> that
>>> on a cluster. It is all in one file though so if there is a problem in
>> file
>>> discovery it won’t show up.
>>> 
>>> 
>>> On Sep 15, 2014, at 9:10 AM, Phil Wills <ot...@gmail.com> wrote:
>>> 
>>> Tried running locally on a reasonably beefy machine and it worked fine.
>>> Which is the toy data, you're referring to?
>>> 
>>> JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 SPARK_HOME=/root/spark
>>> MAHOUT_HOME=. bin/mahout spark-itemsimilarity --input
>>> s3n://recommendation-logs/2014/09/06 --output
>>> s3n://recommendation-outputs/2014/09/06 --filenamePattern '.*'
>> --recursive
>>> --master spark://ec2-54-75-13-36.eu-west-1.compute.amazonaws.com:7077
>>> --sparkExecutorMem 6g
>>> 
>>> and the working version running locally on a beefier box:
>>> 
>>> JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 SPARK_HOME=/root/spark
>>> MAHOUT_HOME=. MAHOUT_HEAPSIZE=16000 bin/mahout spark-itemsimilarity
>> --input
>>> s3n://ophan-recommendation-logs/2014/09/06 --output
>>> s3n://ophan-recommendation-outputs/2014/09/06 --filenamePattern '.*'
>>> --recursive  --sparkExecutorMem 16g
>>> 
>>> Sample input:
>>> 
>>> nnS1dIIBBtTnehVD79lgYeBw
>>> 
>>> 
>> 
> http://www.example.com/world/2014/sep/05/malaysia-airlines-mh370-six-months-chinese-families-lack-answers
>>> 
>>> ikFSk14vHrTPqjSISvMihDUg
>>> 
>>> 
>> 
> http://www.example.com/world/2014/sep/05/obama-core-coalition-10-countries-to-fight-isis
>>> 
>>> edqu8kfgsFSg2w3MhV5rUwuQ
>>> 
>>> 
>> 
> http://www.example.com/lifeandstyle/wordofmouth/2014/sep/05/food-and-drink2?CMP=fb_gu
>>> 
>>> pfnmfONG1DQWG_EOOIxUASow
>>> 
>>> 
>> 
> http://www.example.com/world/live/2014/sep/05/unresponsive-plane-f15-jets-aircraft-live-updates
>>> 
>>> pfUil_W0s2TZSqojMQrVcxVw        http://www.
>>> 
>>> 
>> 
> example.com/football/blog/2014/sep/05/jose-mourinho-bargain-loic-remy-chelsea-france
>>> 
>>> nxTJnpyenFSP-tqWSLHQdW8w
>>> 
>> 
> http://www.example.com/books/2014/sep/05/were-we-happier-in-the-stone-age
>>> 
>>> lba37jwJVQS5GbiSuus1i6tA
>>> 
>>> 
>> 
> http://www.example.com/stage/2014/sep/05/titus-andronicus-review-visually-striking-but-flawed
>>> 
>>> bEHaOzZPbtQz-X2K1wortBQQ
>>> 
>>> 
>> 
> http://www.example.com/cities/2014/sep/05/death-america-suburban-dream-ferguson-missouri-resegregation
>>> 
>>> gjTGzDXiDOT5W2SThhm0tUmg
>>> 
>>> 
>> 
> http://www.example.com/world/2014/sep/05/man-jailed-phoning-texting-ex-21807-times
>>> 
>>> pfFbQ5ddvBRhm0XLZbN6Xd2A
>>> 
>>> 
>> 
> http://www.example.com/sport/2014/sep/05/gloucester-northampton-premiership-rugby
>>> 
>>> 
>>> 
>>> On Sun, Sep 14, 2014 at 4:06 PM, Pat Ferrel <pa...@occamsmachete.com>
>> wrote:
>>> 
>>>> I wonder if it’s trying to write an empty rdd to a text file. Can you
>>> give
>>>> the CLI options and a snippet of data?
>>>> 
>>>> Also have you successfully run this on the toy data in the resource
> dir?
>>>> There is a script to run it locally that you can adapt for running on a
>>>> cluster. This will eliminate any cluster problem.
>>>> 
>>>> 
>>>> On Sep 13, 2014, at 1:13 PM, Phil Wills <ot...@gmail.com> wrote:
>>>> 
>>>> Here's the master log from the line with the stack trace to
> termination:
>>>> 
>>>> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run
>>> saveAsTextFile
>>>> at TextDelimitedReaderWriter.scala:288
>>>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>>> due
>>>> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID
>> 448
>>>> on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
>>>> reason
>>>> Driver stacktrace:
>>>> at org.apache.spark.scheduler.DAGScheduler.org
>>>> 
>>>> 
>>> 
>> 
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
>>>> at
>>>> 
>>>> 
>>> 
>> 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>>>> at
>>>> 
>>>> 
>>> 
>> 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>>>> at
>>>> 
>>>> 
>>> 
>> 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>> at
>>>> 
>>> 
>> 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>>>> at
>>>> 
>>>> 
>>> 
>> 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>>>> at
>>>> 
>>>> 
>>> 
>> 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>>>> at scala.Option.foreach(Option.scala:236)
>>>> at
>>>> 
>>>> 
>>> 
>> 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>>>> at
>>>> 
>>>> 
>>> 
>> 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>>> at
>>>> 
>>>> 
>>> 
>> 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>> at
>>>> 
>>>> 
>>> 
>> 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>> at
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>> at
>>>> 
>>>> 
>>> 
>> 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Executor lost: 8 (epoch
>>> 20)
>>>> 14/09/12 15:54:55 INFO storage.BlockManagerMasterActor: Trying to
> remove
>>>> executor 8 from BlockManagerMaster.
>>>> 14/09/12 15:54:55 INFO storage.BlockManagerMaster: Removed 8
>> successfully
>>>> in removeExecutor
>>>> 14/09/12 15:54:55 INFO storage.BlockManagerInfo: Registering block
>>> manager
>>>> ip-10-105-176-77.eu-west-1.compute.internal:58803 with 3.4 GB RAM
>>>> 14/09/12 15:54:55 INFO cluster.SparkDeploySchedulerBackend: Registered
>>>> executor:
>>>> Actor[akka.tcp://sparkExecutor@ip-10-90-1-56.eu-west-1.compute.internal
>>>> :56590/user/Executor#1456047585]
>>>> with ID 9
>>>> 
>>>> On Sat, Sep 13, 2014 at 4:21 PM, Pat Ferrel <pa...@gmail.com>
>>> wrote:
>>>> 
>>>>> It’s not an error I’ve seen but they can tend to be pretty cryptic.
>>> Could
>>>>> you post more of the stack trace?
>>>>> 
>>>>> On Sep 12, 2014, at 2:55 PM, Phil Wills <ot...@gmail.com> wrote:
>>>>> 
>>>>> I've tried on 1.0.1 and 1.0.2, updating the pom to 1.0.2 when running
>> on
>>>>> that.  I used the spark-ec2 scripts to set up the cluster.
>>>>> 
>>>>> I might be able to share the data I'll mull it over the weekend to
> make
>>>>> sure there's nothing sensitive, or if there's a way I can transform it
>>> to
>>>>> that point.
>>>>> 
>>>>> Phil
>>>>> 
>>>>> 
>>>>> On Fri, Sep 12, 2014 at 6:30 PM, Pat Ferrel <pa...@occamsmachete.com>
>>>> wrote:
>>>>> 
>>>>>> The mahout pom says 1.0.1 but I’m running fine on 1.0.2
>>>>>> 
>>>>>> 
>>>>>> On Sep 12, 2014, at 10:08 AM, Pat Ferrel <pa...@occamsmachete.com>
>>> wrote:
>>>>>> 
>>>>>> Is it a mature Spark cluster, what version of Spark?
>>>>>> 
>>>>>> If you can share the data I can try it on mine.
>>>>>> 
>>>>>> On Sep 12, 2014, at 9:42 AM, Phil Wills <ot...@gmail.com> wrote:
>>>>>> 
>>>>>> I've been experimenting with the fairly new ItemSimilarityDriver,
>> which
>>>>> is
>>>>>> working fine up until the point it tries to write out it's results.
>>>>>> Initially I was getting an issue with the akka frameSize being too
>>>> small,
>>>>>> but after expanding that I'm now getting a much more cryptic error:
>>>>>> 
>>>>>> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run
>>>>> saveAsTextFile
>>>>>> at TextDelimitedReaderWriter.scala:288
>>>>>> Exception in thread "main" org.apache.spark.SparkException: Job
>> aborted
>>>>> due
>>>>>> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID
>>>> 448
>>>>>> on host ip-10-105-176-77.eu-west-1.compute.internal failed for
> unknown
>>>>>> reason
>>>>>> 
>>>>>> This is from the master node, but there doesn't seem to be anything
>>> more
>>>>>> intelligible in the slave node logs.
>>>>>> 
>>>>>> I've tried writing to the local file system as well as s3n and can
> see
>>>>> it's
>>>>>> not an access problem, as I am seeing a zero length file appear.
>>>>>> 
>>>>>> Thanks for any pointers and apologies if this would be better to ask
>> on
>>>>> the
>>>>>> Spark list,
>>>>>> 
>>>>>> Phil
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
>

Re: ItemSimilarityDriver failing to write text file

Posted by Phil Wills <ot...@gmail.com>.

So after getting to know Spark a bit better and some further digging, I now
believe this is down to https://issues.apache.org/jira/browse/SPARK-2075.

I thought I could work around this, by using Mahout as a library and
submitting it as a standard Spark job. Unfortunately, I can't work out how
to express a dependency on the 1.0-SNAPSHOT appropriately, at least with
SBT, which is my normal build tool. Is there an example build file for
using the snapshot version as a library?

Thanks,

Phil

On Wed, Sep 17, 2014 at 3:11 AM, Pat Ferrel <pa...@gmail.com> wrote:

> Hmm, well if that’s so then you are also able to see the data since you’re
> reading and writing to the same S3 location in either case. The only
> difference is the Spark master and therefore perhaps a Spark issue?  Not
> sure I can help much more. I don’t have access to the same setup as you
> have. Is the Spark community able to help or at least throw the ball back
> in my court?
>
> Does the debug output indicate that the read and computation went ok? Does
> it look the same as running local? No new warnings earlier in the run? BTW
> to get local to use multiple cores run with master set to something like
> “local[4]”.
>
> On Sep 16, 2014, at 1:22 PM, Phil Wills <ot...@gmail.com> wrote:
>
> No, by local I mean running on one a large ec2 box spun up by the same
> script, but running the 'mahout spark-itemsimilarity' command without a
> master specified, so that it runs locally to that box, so I'm confident
> about the versions being the same in local to that box and distributed
> across the cluster modes. Apologies for the lack of clarity.
>
> Phil
>
> On Tue, Sep 16, 2014 at 7:48 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
> > By local I assume you are talking about your dev machine, not one of the
> > cluster machines.
> >
> > Excuse me if I’m stating the obvious but you are using two completely
> > different Spark and Hadoop installations, one local and one remote. They
> > could be completely different codebases. Just because you have configured
> > Spark and Hadoop to execute locally doesn’t mean they work remotely. It
> > sounds like you are using the CLI on your dev machine, which is set to
> run
> > locally, and passing a remote Spark master URI and S3 URI to the local
> > Mahout script. I would install and set up Mahout on your cluster master,
> > make sure MAHOUT_LOCAL is not set there since you will be using a
> cluster,
> > and execute the CLI from there.
> >
> > Furthermore are you sure that the remote Spark cluster can see the S3
> > data? Ssh to the Spark master and do something like “hadoop fs -ls” or
> > supply the URI to verify that the Hadoop config on the remote cluster,
> > which is what the remote Spark will use, can get to the data.
> >
> >
> > On Sep 15, 2014, at 2:28 PM, Phil Wills <ot...@gmail.com> wrote:
> >
> > The data and s3n file system is OK, since when I run 'locally' that's
> just
> > without a master specified, but otherwise identically, it works fine.
> I've
> > been using the spark-ec2 scripts to retrieve spark and hadoop, so had
> > assumed that meant they were operating compatible versions, but I'm not
> > specifying which hadoop to use explicitly, so I don't know if that has an
> > effect.
> >
> > Phil
> >
> > On Mon, Sep 15, 2014 at 7:25 PM, Pat Ferrel <pa...@occamsmachete.com>
> wrote:
> >
> >> It should handle this input—no surprise.
> >>
> >> Spark must be compiled for the correct version of Hadoop that you are
> >> using (Mahout also). I’d make sure Spark is working properly with your
> > HDFS
> >> by trying one of their examples if you haven’t already. Running locally
> > may
> >> not be using the same version of Hadoop, have you checked that?
> >>
> >> A filenamePattern of ‘.*’ will get all files in
> >> s3n://recommendation-logs/2014/09/06 and you have it set to search
> >> recursively. Check to make sure this is what you want. Did you use the
> > same
> >> dir structure as you have on s3n when you ran locally? Since this driver
> >> looks at text files it can think it is working on data if it finds “[\t,
> > ]”
> >> a tab, comma, or space in the line when it’s reading garbage so you
> > should
> >> be sure it is working on only the files you want. Tell it to look for
> > only
> >> a tab if that’s what you are using or use a regex to match the entire
> >> filename like “^part.*” or “.*log”.
> >>
> >> I have not tested with s3n:// URIs. I assume you can read all these with
> >> the hadoop tools like “hadoop fs -ls
> > s3n://recommendation-logs/2014/09/06”?
> >>
> >> off-list I’ll send a link to epinions data formatted for Mahout. You can
> >> try putting that in HDFS via sn3 and running it because I have tested
> > that
> >> on a cluster. It is all in one file though so if there is a problem in
> > file
> >> discovery it won’t show up.
> >>
> >>
> >> On Sep 15, 2014, at 9:10 AM, Phil Wills <ot...@gmail.com> wrote:
> >>
> >> Tried running locally on a reasonably beefy machine and it worked fine.
> >> Which is the toy data, you're referring to?
> >>
> >> JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 SPARK_HOME=/root/spark
> >> MAHOUT_HOME=. bin/mahout spark-itemsimilarity --input
> >> s3n://recommendation-logs/2014/09/06 --output
> >> s3n://recommendation-outputs/2014/09/06 --filenamePattern '.*'
> > --recursive
> >> --master spark://ec2-54-75-13-36.eu-west-1.compute.amazonaws.com:7077
> >> --sparkExecutorMem 6g
> >>
> >> and the working version running locally on a beefier box:
> >>
> >> JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 SPARK_HOME=/root/spark
> >> MAHOUT_HOME=. MAHOUT_HEAPSIZE=16000 bin/mahout spark-itemsimilarity
> > --input
> >> s3n://ophan-recommendation-logs/2014/09/06 --output
> >> s3n://ophan-recommendation-outputs/2014/09/06 --filenamePattern '.*'
> >> --recursive  --sparkExecutorMem 16g
> >>
> >> Sample input:
> >>
> >> nnS1dIIBBtTnehVD79lgYeBw
> >>
> >>
> >
> http://www.example.com/world/2014/sep/05/malaysia-airlines-mh370-six-months-chinese-families-lack-answers
> >>
> >> ikFSk14vHrTPqjSISvMihDUg
> >>
> >>
> >
> http://www.example.com/world/2014/sep/05/obama-core-coalition-10-countries-to-fight-isis
> >>
> >> edqu8kfgsFSg2w3MhV5rUwuQ
> >>
> >>
> >
> http://www.example.com/lifeandstyle/wordofmouth/2014/sep/05/food-and-drink2?CMP=fb_gu
> >>
> >> pfnmfONG1DQWG_EOOIxUASow
> >>
> >>
> >
> http://www.example.com/world/live/2014/sep/05/unresponsive-plane-f15-jets-aircraft-live-updates
> >>
> >> pfUil_W0s2TZSqojMQrVcxVw        http://www.
> >>
> >>
> >
> example.com/football/blog/2014/sep/05/jose-mourinho-bargain-loic-remy-chelsea-france
> >>
> >> nxTJnpyenFSP-tqWSLHQdW8w
> >>
> >
> http://www.example.com/books/2014/sep/05/were-we-happier-in-the-stone-age
> >>
> >> lba37jwJVQS5GbiSuus1i6tA
> >>
> >>
> >
> http://www.example.com/stage/2014/sep/05/titus-andronicus-review-visually-striking-but-flawed
> >>
> >> bEHaOzZPbtQz-X2K1wortBQQ
> >>
> >>
> >
> http://www.example.com/cities/2014/sep/05/death-america-suburban-dream-ferguson-missouri-resegregation
> >>
> >> gjTGzDXiDOT5W2SThhm0tUmg
> >>
> >>
> >
> http://www.example.com/world/2014/sep/05/man-jailed-phoning-texting-ex-21807-times
> >>
> >> pfFbQ5ddvBRhm0XLZbN6Xd2A
> >>
> >>
> >
> http://www.example.com/sport/2014/sep/05/gloucester-northampton-premiership-rugby
> >>
> >>
> >>
> >> On Sun, Sep 14, 2014 at 4:06 PM, Pat Ferrel <pa...@occamsmachete.com>
> > wrote:
> >>
> >>> I wonder if it’s trying to write an empty rdd to a text file. Can you
> >> give
> >>> the CLI options and a snippet of data?
> >>>
> >>> Also have you successfully run this on the toy data in the resource
> dir?
> >>> There is a script to run it locally that you can adapt for running on a
> >>> cluster. This will eliminate any cluster problem.
> >>>
> >>>
> >>> On Sep 13, 2014, at 1:13 PM, Phil Wills <ot...@gmail.com> wrote:
> >>>
> >>> Here's the master log from the line with the stack trace to
> termination:
> >>>
> >>> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run
> >> saveAsTextFile
> >>> at TextDelimitedReaderWriter.scala:288
> >>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> >> due
> >>> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID
> > 448
> >>> on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
> >>> reason
> >>> Driver stacktrace:
> >>> at org.apache.spark.scheduler.DAGScheduler.org
> >>>
> >>>
> >>
> >
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
> >>> at
> >>>
> >>>
> >>
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
> >>> at
> >>>
> >>>
> >>
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
> >>> at
> >>>
> >>>
> >>
> >
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> >>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> >>> at
> >>>
> >>
> >
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
> >>> at
> >>>
> >>>
> >>
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
> >>> at
> >>>
> >>>
> >>
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
> >>> at scala.Option.foreach(Option.scala:236)
> >>> at
> >>>
> >>>
> >>
> >
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
> >>> at
> >>>
> >>>
> >>
> >
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
> >>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> >>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> >>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> >>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> >>> at
> >>>
> >>>
> >>
> >
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> >>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> >>> at
> >>>
> >>>
> >>
> >
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> >>> at
> >> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> >>> at
> >>>
> >>>
> >>
> >
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> >>> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Executor lost: 8 (epoch
> >> 20)
> >>> 14/09/12 15:54:55 INFO storage.BlockManagerMasterActor: Trying to
> remove
> >>> executor 8 from BlockManagerMaster.
> >>> 14/09/12 15:54:55 INFO storage.BlockManagerMaster: Removed 8
> > successfully
> >>> in removeExecutor
> >>> 14/09/12 15:54:55 INFO storage.BlockManagerInfo: Registering block
> >> manager
> >>> ip-10-105-176-77.eu-west-1.compute.internal:58803 with 3.4 GB RAM
> >>> 14/09/12 15:54:55 INFO cluster.SparkDeploySchedulerBackend: Registered
> >>> executor:
> >>> Actor[akka.tcp://sparkExecutor@ip-10-90-1-56.eu-west-1.compute.internal
> >>> :56590/user/Executor#1456047585]
> >>> with ID 9
> >>>
> >>> On Sat, Sep 13, 2014 at 4:21 PM, Pat Ferrel <pa...@gmail.com>
> >> wrote:
> >>>
> >>>> It’s not an error I’ve seen but they can tend to be pretty cryptic.
> >> Could
> >>>> you post more of the stack trace?
> >>>>
> >>>> On Sep 12, 2014, at 2:55 PM, Phil Wills <ot...@gmail.com> wrote:
> >>>>
> >>>> I've tried on 1.0.1 and 1.0.2, updating the pom to 1.0.2 when running
> > on
> >>>> that.  I used the spark-ec2 scripts to set up the cluster.
> >>>>
> >>>> I might be able to share the data I'll mull it over the weekend to
> make
> >>>> sure there's nothing sensitive, or if there's a way I can transform it
> >> to
> >>>> that point.
> >>>>
> >>>> Phil
> >>>>
> >>>>
> >>>> On Fri, Sep 12, 2014 at 6:30 PM, Pat Ferrel <pa...@occamsmachete.com>
> >>> wrote:
> >>>>
> >>>>> The mahout pom says 1.0.1 but I’m running fine on 1.0.2
> >>>>>
> >>>>>
> >>>>> On Sep 12, 2014, at 10:08 AM, Pat Ferrel <pa...@occamsmachete.com>
> >> wrote:
> >>>>>
> >>>>> Is it a mature Spark cluster, what version of Spark?
> >>>>>
> >>>>> If you can share the data I can try it on mine.
> >>>>>
> >>>>> On Sep 12, 2014, at 9:42 AM, Phil Wills <ot...@gmail.com> wrote:
> >>>>>
> >>>>> I've been experimenting with the fairly new ItemSimilarityDriver,
> > which
> >>>> is
> >>>>> working fine up until the point it tries to write out it's results.
> >>>>> Initially I was getting an issue with the akka frameSize being too
> >>> small,
> >>>>> but after expanding that I'm now getting a much more cryptic error:
> >>>>>
> >>>>> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run
> >>>> saveAsTextFile
> >>>>> at TextDelimitedReaderWriter.scala:288
> >>>>> Exception in thread "main" org.apache.spark.SparkException: Job
> > aborted
> >>>> due
> >>>>> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID
> >>> 448
> >>>>> on host ip-10-105-176-77.eu-west-1.compute.internal failed for
> unknown
> >>>>> reason
> >>>>>
> >>>>> This is from the master node, but there doesn't seem to be anything
> >> more
> >>>>> intelligible in the slave node logs.
> >>>>>
> >>>>> I've tried writing to the local file system as well as s3n and can
> see
> >>>> it's
> >>>>> not an access problem, as I am seeing a zero length file appear.
> >>>>>
> >>>>> Thanks for any pointers and apologies if this would be better to ask
> > on
> >>>> the
> >>>>> Spark list,
> >>>>>
> >>>>> Phil
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
>
>

Re: ItemSimilarityDriver failing to write text file

Posted by Pat Ferrel <pa...@gmail.com>.

Hmm, well if that’s so then you are also able to see the data since you’re reading and writing to the same S3 location in either case. The only difference is the Spark master and therefore perhaps a Spark issue?  Not sure I can help much more. I don’t have access to the same setup as you have. Is the Spark community able to help or at least throw the ball back in my court?

Does the debug output indicate that the read and computation went ok? Does it look the same as running local? No new warnings earlier in the run? BTW to get local to use multiple cores run with master set to something like “local[4]”.

On Sep 16, 2014, at 1:22 PM, Phil Wills <ot...@gmail.com> wrote:

No, by local I mean running on one a large ec2 box spun up by the same
script, but running the 'mahout spark-itemsimilarity' command without a
master specified, so that it runs locally to that box, so I'm confident
about the versions being the same in local to that box and distributed
across the cluster modes. Apologies for the lack of clarity.

Phil

On Tue, Sep 16, 2014 at 7:48 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> By local I assume you are talking about your dev machine, not one of the
> cluster machines.
> 
> Excuse me if I’m stating the obvious but you are using two completely
> different Spark and Hadoop installations, one local and one remote. They
> could be completely different codebases. Just because you have configured
> Spark and Hadoop to execute locally doesn’t mean they work remotely. It
> sounds like you are using the CLI on your dev machine, which is set to run
> locally, and passing a remote Spark master URI and S3 URI to the local
> Mahout script. I would install and set up Mahout on your cluster master,
> make sure MAHOUT_LOCAL is not set there since you will be using a cluster,
> and execute the CLI from there.
> 
> Furthermore are you sure that the remote Spark cluster can see the S3
> data? Ssh to the Spark master and do something like “hadoop fs -ls” or
> supply the URI to verify that the Hadoop config on the remote cluster,
> which is what the remote Spark will use, can get to the data.
> 
> 
> On Sep 15, 2014, at 2:28 PM, Phil Wills <ot...@gmail.com> wrote:
> 
> The data and s3n file system is OK, since when I run 'locally' that's just
> without a master specified, but otherwise identically, it works fine. I've
> been using the spark-ec2 scripts to retrieve spark and hadoop, so had
> assumed that meant they were operating compatible versions, but I'm not
> specifying which hadoop to use explicitly, so I don't know if that has an
> effect.
> 
> Phil
> 
> On Mon, Sep 15, 2014 at 7:25 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
> 
>> It should handle this input—no surprise.
>> 
>> Spark must be compiled for the correct version of Hadoop that you are
>> using (Mahout also). I’d make sure Spark is working properly with your
> HDFS
>> by trying one of their examples if you haven’t already. Running locally
> may
>> not be using the same version of Hadoop, have you checked that?
>> 
>> A filenamePattern of ‘.*’ will get all files in
>> s3n://recommendation-logs/2014/09/06 and you have it set to search
>> recursively. Check to make sure this is what you want. Did you use the
> same
>> dir structure as you have on s3n when you ran locally? Since this driver
>> looks at text files it can think it is working on data if it finds “[\t,
> ]”
>> a tab, comma, or space in the line when it’s reading garbage so you
> should
>> be sure it is working on only the files you want. Tell it to look for
> only
>> a tab if that’s what you are using or use a regex to match the entire
>> filename like “^part.*” or “.*log”.
>> 
>> I have not tested with s3n:// URIs. I assume you can read all these with
>> the hadoop tools like “hadoop fs -ls
> s3n://recommendation-logs/2014/09/06”?
>> 
>> off-list I’ll send a link to epinions data formatted for Mahout. You can
>> try putting that in HDFS via sn3 and running it because I have tested
> that
>> on a cluster. It is all in one file though so if there is a problem in
> file
>> discovery it won’t show up.
>> 
>> 
>> On Sep 15, 2014, at 9:10 AM, Phil Wills <ot...@gmail.com> wrote:
>> 
>> Tried running locally on a reasonably beefy machine and it worked fine.
>> Which is the toy data, you're referring to?
>> 
>> JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 SPARK_HOME=/root/spark
>> MAHOUT_HOME=. bin/mahout spark-itemsimilarity --input
>> s3n://recommendation-logs/2014/09/06 --output
>> s3n://recommendation-outputs/2014/09/06 --filenamePattern '.*'
> --recursive
>> --master spark://ec2-54-75-13-36.eu-west-1.compute.amazonaws.com:7077
>> --sparkExecutorMem 6g
>> 
>> and the working version running locally on a beefier box:
>> 
>> JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 SPARK_HOME=/root/spark
>> MAHOUT_HOME=. MAHOUT_HEAPSIZE=16000 bin/mahout spark-itemsimilarity
> --input
>> s3n://ophan-recommendation-logs/2014/09/06 --output
>> s3n://ophan-recommendation-outputs/2014/09/06 --filenamePattern '.*'
>> --recursive  --sparkExecutorMem 16g
>> 
>> Sample input:
>> 
>> nnS1dIIBBtTnehVD79lgYeBw
>> 
>> 
> http://www.example.com/world/2014/sep/05/malaysia-airlines-mh370-six-months-chinese-families-lack-answers
>> 
>> ikFSk14vHrTPqjSISvMihDUg
>> 
>> 
> http://www.example.com/world/2014/sep/05/obama-core-coalition-10-countries-to-fight-isis
>> 
>> edqu8kfgsFSg2w3MhV5rUwuQ
>> 
>> 
> http://www.example.com/lifeandstyle/wordofmouth/2014/sep/05/food-and-drink2?CMP=fb_gu
>> 
>> pfnmfONG1DQWG_EOOIxUASow
>> 
>> 
> http://www.example.com/world/live/2014/sep/05/unresponsive-plane-f15-jets-aircraft-live-updates
>> 
>> pfUil_W0s2TZSqojMQrVcxVw        http://www.
>> 
>> 
> example.com/football/blog/2014/sep/05/jose-mourinho-bargain-loic-remy-chelsea-france
>> 
>> nxTJnpyenFSP-tqWSLHQdW8w
>> 
> http://www.example.com/books/2014/sep/05/were-we-happier-in-the-stone-age
>> 
>> lba37jwJVQS5GbiSuus1i6tA
>> 
>> 
> http://www.example.com/stage/2014/sep/05/titus-andronicus-review-visually-striking-but-flawed
>> 
>> bEHaOzZPbtQz-X2K1wortBQQ
>> 
>> 
> http://www.example.com/cities/2014/sep/05/death-america-suburban-dream-ferguson-missouri-resegregation
>> 
>> gjTGzDXiDOT5W2SThhm0tUmg
>> 
>> 
> http://www.example.com/world/2014/sep/05/man-jailed-phoning-texting-ex-21807-times
>> 
>> pfFbQ5ddvBRhm0XLZbN6Xd2A
>> 
>> 
> http://www.example.com/sport/2014/sep/05/gloucester-northampton-premiership-rugby
>> 
>> 
>> 
>> On Sun, Sep 14, 2014 at 4:06 PM, Pat Ferrel <pa...@occamsmachete.com>
> wrote:
>> 
>>> I wonder if it’s trying to write an empty rdd to a text file. Can you
>> give
>>> the CLI options and a snippet of data?
>>> 
>>> Also have you successfully run this on the toy data in the resource dir?
>>> There is a script to run it locally that you can adapt for running on a
>>> cluster. This will eliminate any cluster problem.
>>> 
>>> 
>>> On Sep 13, 2014, at 1:13 PM, Phil Wills <ot...@gmail.com> wrote:
>>> 
>>> Here's the master log from the line with the stack trace to termination:
>>> 
>>> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run
>> saveAsTextFile
>>> at TextDelimitedReaderWriter.scala:288
>>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>> due
>>> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID
> 448
>>> on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
>>> reason
>>> Driver stacktrace:
>>> at org.apache.spark.scheduler.DAGScheduler.org
>>> 
>>> 
>> 
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
>>> at
>>> 
>>> 
>> 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>>> at
>>> 
>>> 
>> 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>>> at
>>> 
>>> 
>> 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>> at
>>> 
>> 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>>> at
>>> 
>>> 
>> 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>>> at
>>> 
>>> 
>> 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>>> at scala.Option.foreach(Option.scala:236)
>>> at
>>> 
>>> 
>> 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>>> at
>>> 
>>> 
>> 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>> at
>>> 
>>> 
>> 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> at
>>> 
>>> 
>> 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>> at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> at
>>> 
>>> 
>> 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Executor lost: 8 (epoch
>> 20)
>>> 14/09/12 15:54:55 INFO storage.BlockManagerMasterActor: Trying to remove
>>> executor 8 from BlockManagerMaster.
>>> 14/09/12 15:54:55 INFO storage.BlockManagerMaster: Removed 8
> successfully
>>> in removeExecutor
>>> 14/09/12 15:54:55 INFO storage.BlockManagerInfo: Registering block
>> manager
>>> ip-10-105-176-77.eu-west-1.compute.internal:58803 with 3.4 GB RAM
>>> 14/09/12 15:54:55 INFO cluster.SparkDeploySchedulerBackend: Registered
>>> executor:
>>> Actor[akka.tcp://sparkExecutor@ip-10-90-1-56.eu-west-1.compute.internal
>>> :56590/user/Executor#1456047585]
>>> with ID 9
>>> 
>>> On Sat, Sep 13, 2014 at 4:21 PM, Pat Ferrel <pa...@gmail.com>
>> wrote:
>>> 
>>>> It’s not an error I’ve seen but they can tend to be pretty cryptic.
>> Could
>>>> you post more of the stack trace?
>>>> 
>>>> On Sep 12, 2014, at 2:55 PM, Phil Wills <ot...@gmail.com> wrote:
>>>> 
>>>> I've tried on 1.0.1 and 1.0.2, updating the pom to 1.0.2 when running
> on
>>>> that.  I used the spark-ec2 scripts to set up the cluster.
>>>> 
>>>> I might be able to share the data I'll mull it over the weekend to make
>>>> sure there's nothing sensitive, or if there's a way I can transform it
>> to
>>>> that point.
>>>> 
>>>> Phil
>>>> 
>>>> 
>>>> On Fri, Sep 12, 2014 at 6:30 PM, Pat Ferrel <pa...@occamsmachete.com>
>>> wrote:
>>>> 
>>>>> The mahout pom says 1.0.1 but I’m running fine on 1.0.2
>>>>> 
>>>>> 
>>>>> On Sep 12, 2014, at 10:08 AM, Pat Ferrel <pa...@occamsmachete.com>
>> wrote:
>>>>> 
>>>>> Is it a mature Spark cluster, what version of Spark?
>>>>> 
>>>>> If you can share the data I can try it on mine.
>>>>> 
>>>>> On Sep 12, 2014, at 9:42 AM, Phil Wills <ot...@gmail.com> wrote:
>>>>> 
>>>>> I've been experimenting with the fairly new ItemSimilarityDriver,
> which
>>>> is
>>>>> working fine up until the point it tries to write out it's results.
>>>>> Initially I was getting an issue with the akka frameSize being too
>>> small,
>>>>> but after expanding that I'm now getting a much more cryptic error:
>>>>> 
>>>>> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run
>>>> saveAsTextFile
>>>>> at TextDelimitedReaderWriter.scala:288
>>>>> Exception in thread "main" org.apache.spark.SparkException: Job
> aborted
>>>> due
>>>>> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID
>>> 448
>>>>> on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
>>>>> reason
>>>>> 
>>>>> This is from the master node, but there doesn't seem to be anything
>> more
>>>>> intelligible in the slave node logs.
>>>>> 
>>>>> I've tried writing to the local file system as well as s3n and can see
>>>> it's
>>>>> not an access problem, as I am seeing a zero length file appear.
>>>>> 
>>>>> Thanks for any pointers and apologies if this would be better to ask
> on
>>>> the
>>>>> Spark list,
>>>>> 
>>>>> Phil
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
>

Re: ItemSimilarityDriver failing to write text file

Posted by Phil Wills <ot...@gmail.com>.

No, by local I mean running on one a large ec2 box spun up by the same
script, but running the 'mahout spark-itemsimilarity' command without a
master specified, so that it runs locally to that box, so I'm confident
about the versions being the same in local to that box and distributed
across the cluster modes. Apologies for the lack of clarity.

Phil

On Tue, Sep 16, 2014 at 7:48 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> By local I assume you are talking about your dev machine, not one of the
> cluster machines.
>
> Excuse me if I’m stating the obvious but you are using two completely
> different Spark and Hadoop installations, one local and one remote. They
> could be completely different codebases. Just because you have configured
> Spark and Hadoop to execute locally doesn’t mean they work remotely. It
> sounds like you are using the CLI on your dev machine, which is set to run
> locally, and passing a remote Spark master URI and S3 URI to the local
> Mahout script. I would install and set up Mahout on your cluster master,
> make sure MAHOUT_LOCAL is not set there since you will be using a cluster,
> and execute the CLI from there.
>
> Furthermore are you sure that the remote Spark cluster can see the S3
> data? Ssh to the Spark master and do something like “hadoop fs -ls” or
> supply the URI to verify that the Hadoop config on the remote cluster,
> which is what the remote Spark will use, can get to the data.
>
>
> On Sep 15, 2014, at 2:28 PM, Phil Wills <ot...@gmail.com> wrote:
>
> The data and s3n file system is OK, since when I run 'locally' that's just
> without a master specified, but otherwise identically, it works fine. I've
> been using the spark-ec2 scripts to retrieve spark and hadoop, so had
> assumed that meant they were operating compatible versions, but I'm not
> specifying which hadoop to use explicitly, so I don't know if that has an
> effect.
>
> Phil
>
> On Mon, Sep 15, 2014 at 7:25 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
> > It should handle this input—no surprise.
> >
> > Spark must be compiled for the correct version of Hadoop that you are
> > using (Mahout also). I’d make sure Spark is working properly with your
> HDFS
> > by trying one of their examples if you haven’t already. Running locally
> may
> > not be using the same version of Hadoop, have you checked that?
> >
> > A filenamePattern of ‘.*’ will get all files in
> > s3n://recommendation-logs/2014/09/06 and you have it set to search
> > recursively. Check to make sure this is what you want. Did you use the
> same
> > dir structure as you have on s3n when you ran locally? Since this driver
> > looks at text files it can think it is working on data if it finds “[\t,
> ]”
> > a tab, comma, or space in the line when it’s reading garbage so you
> should
> > be sure it is working on only the files you want. Tell it to look for
> only
> > a tab if that’s what you are using or use a regex to match the entire
> > filename like “^part.*” or “.*log”.
> >
> > I have not tested with s3n:// URIs. I assume you can read all these with
> > the hadoop tools like “hadoop fs -ls
> s3n://recommendation-logs/2014/09/06”?
> >
> > off-list I’ll send a link to epinions data formatted for Mahout. You can
> > try putting that in HDFS via sn3 and running it because I have tested
> that
> > on a cluster. It is all in one file though so if there is a problem in
> file
> > discovery it won’t show up.
> >
> >
> > On Sep 15, 2014, at 9:10 AM, Phil Wills <ot...@gmail.com> wrote:
> >
> > Tried running locally on a reasonably beefy machine and it worked fine.
> > Which is the toy data, you're referring to?
> >
> > JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 SPARK_HOME=/root/spark
> > MAHOUT_HOME=. bin/mahout spark-itemsimilarity --input
> > s3n://recommendation-logs/2014/09/06 --output
> > s3n://recommendation-outputs/2014/09/06 --filenamePattern '.*'
> --recursive
> > --master spark://ec2-54-75-13-36.eu-west-1.compute.amazonaws.com:7077
> > --sparkExecutorMem 6g
> >
> > and the working version running locally on a beefier box:
> >
> > JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 SPARK_HOME=/root/spark
> > MAHOUT_HOME=. MAHOUT_HEAPSIZE=16000 bin/mahout spark-itemsimilarity
> --input
> > s3n://ophan-recommendation-logs/2014/09/06 --output
> > s3n://ophan-recommendation-outputs/2014/09/06 --filenamePattern '.*'
> > --recursive  --sparkExecutorMem 16g
> >
> > Sample input:
> >
> > nnS1dIIBBtTnehVD79lgYeBw
> >
> >
> http://www.example.com/world/2014/sep/05/malaysia-airlines-mh370-six-months-chinese-families-lack-answers
> >
> > ikFSk14vHrTPqjSISvMihDUg
> >
> >
> http://www.example.com/world/2014/sep/05/obama-core-coalition-10-countries-to-fight-isis
> >
> > edqu8kfgsFSg2w3MhV5rUwuQ
> >
> >
> http://www.example.com/lifeandstyle/wordofmouth/2014/sep/05/food-and-drink2?CMP=fb_gu
> >
> > pfnmfONG1DQWG_EOOIxUASow
> >
> >
> http://www.example.com/world/live/2014/sep/05/unresponsive-plane-f15-jets-aircraft-live-updates
> >
> > pfUil_W0s2TZSqojMQrVcxVw        http://www.
> >
> >
> example.com/football/blog/2014/sep/05/jose-mourinho-bargain-loic-remy-chelsea-france
> >
> > nxTJnpyenFSP-tqWSLHQdW8w
> >
> http://www.example.com/books/2014/sep/05/were-we-happier-in-the-stone-age
> >
> > lba37jwJVQS5GbiSuus1i6tA
> >
> >
> http://www.example.com/stage/2014/sep/05/titus-andronicus-review-visually-striking-but-flawed
> >
> > bEHaOzZPbtQz-X2K1wortBQQ
> >
> >
> http://www.example.com/cities/2014/sep/05/death-america-suburban-dream-ferguson-missouri-resegregation
> >
> > gjTGzDXiDOT5W2SThhm0tUmg
> >
> >
> http://www.example.com/world/2014/sep/05/man-jailed-phoning-texting-ex-21807-times
> >
> > pfFbQ5ddvBRhm0XLZbN6Xd2A
> >
> >
> http://www.example.com/sport/2014/sep/05/gloucester-northampton-premiership-rugby
> >
> >
> >
> > On Sun, Sep 14, 2014 at 4:06 PM, Pat Ferrel <pa...@occamsmachete.com>
> wrote:
> >
> >> I wonder if it’s trying to write an empty rdd to a text file. Can you
> > give
> >> the CLI options and a snippet of data?
> >>
> >> Also have you successfully run this on the toy data in the resource dir?
> >> There is a script to run it locally that you can adapt for running on a
> >> cluster. This will eliminate any cluster problem.
> >>
> >>
> >> On Sep 13, 2014, at 1:13 PM, Phil Wills <ot...@gmail.com> wrote:
> >>
> >> Here's the master log from the line with the stack trace to termination:
> >>
> >> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run
> > saveAsTextFile
> >> at TextDelimitedReaderWriter.scala:288
> >> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> > due
> >> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID
> 448
> >> on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
> >> reason
> >> Driver stacktrace:
> >> at org.apache.spark.scheduler.DAGScheduler.org
> >>
> >>
> >
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
> >> at
> >>
> >>
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
> >> at
> >>
> >>
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
> >> at
> >>
> >>
> >
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> >> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> >> at
> >>
> >
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
> >> at
> >>
> >>
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
> >> at
> >>
> >>
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
> >> at scala.Option.foreach(Option.scala:236)
> >> at
> >>
> >>
> >
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
> >> at
> >>
> >>
> >
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
> >> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> >> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> >> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> >> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> >> at
> >>
> >>
> >
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> >> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> >> at
> >>
> >>
> >
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> >> at
> > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> >> at
> >>
> >>
> >
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> >> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Executor lost: 8 (epoch
> > 20)
> >> 14/09/12 15:54:55 INFO storage.BlockManagerMasterActor: Trying to remove
> >> executor 8 from BlockManagerMaster.
> >> 14/09/12 15:54:55 INFO storage.BlockManagerMaster: Removed 8
> successfully
> >> in removeExecutor
> >> 14/09/12 15:54:55 INFO storage.BlockManagerInfo: Registering block
> > manager
> >> ip-10-105-176-77.eu-west-1.compute.internal:58803 with 3.4 GB RAM
> >> 14/09/12 15:54:55 INFO cluster.SparkDeploySchedulerBackend: Registered
> >> executor:
> >> Actor[akka.tcp://sparkExecutor@ip-10-90-1-56.eu-west-1.compute.internal
> >> :56590/user/Executor#1456047585]
> >> with ID 9
> >>
> >> On Sat, Sep 13, 2014 at 4:21 PM, Pat Ferrel <pa...@gmail.com>
> > wrote:
> >>
> >>> It’s not an error I’ve seen but they can tend to be pretty cryptic.
> > Could
> >>> you post more of the stack trace?
> >>>
> >>> On Sep 12, 2014, at 2:55 PM, Phil Wills <ot...@gmail.com> wrote:
> >>>
> >>> I've tried on 1.0.1 and 1.0.2, updating the pom to 1.0.2 when running
> on
> >>> that.  I used the spark-ec2 scripts to set up the cluster.
> >>>
> >>> I might be able to share the data I'll mull it over the weekend to make
> >>> sure there's nothing sensitive, or if there's a way I can transform it
> > to
> >>> that point.
> >>>
> >>> Phil
> >>>
> >>>
> >>> On Fri, Sep 12, 2014 at 6:30 PM, Pat Ferrel <pa...@occamsmachete.com>
> >> wrote:
> >>>
> >>>> The mahout pom says 1.0.1 but I’m running fine on 1.0.2
> >>>>
> >>>>
> >>>> On Sep 12, 2014, at 10:08 AM, Pat Ferrel <pa...@occamsmachete.com>
> > wrote:
> >>>>
> >>>> Is it a mature Spark cluster, what version of Spark?
> >>>>
> >>>> If you can share the data I can try it on mine.
> >>>>
> >>>> On Sep 12, 2014, at 9:42 AM, Phil Wills <ot...@gmail.com> wrote:
> >>>>
> >>>> I've been experimenting with the fairly new ItemSimilarityDriver,
> which
> >>> is
> >>>> working fine up until the point it tries to write out it's results.
> >>>> Initially I was getting an issue with the akka frameSize being too
> >> small,
> >>>> but after expanding that I'm now getting a much more cryptic error:
> >>>>
> >>>> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run
> >>> saveAsTextFile
> >>>> at TextDelimitedReaderWriter.scala:288
> >>>> Exception in thread "main" org.apache.spark.SparkException: Job
> aborted
> >>> due
> >>>> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID
> >> 448
> >>>> on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
> >>>> reason
> >>>>
> >>>> This is from the master node, but there doesn't seem to be anything
> > more
> >>>> intelligible in the slave node logs.
> >>>>
> >>>> I've tried writing to the local file system as well as s3n and can see
> >>> it's
> >>>> not an access problem, as I am seeing a zero length file appear.
> >>>>
> >>>> Thanks for any pointers and apologies if this would be better to ask
> on
> >>> the
> >>>> Spark list,
> >>>>
> >>>> Phil
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
>
>

Re: ItemSimilarityDriver failing to write text file

Posted by Pat Ferrel <pa...@occamsmachete.com>.

By local I assume you are talking about your dev machine, not one of the cluster machines.

Excuse me if I’m stating the obvious but you are using two completely different Spark and Hadoop installations, one local and one remote. They could be completely different codebases. Just because you have configured Spark and Hadoop to execute locally doesn’t mean they work remotely. It sounds like you are using the CLI on your dev machine, which is set to run locally, and passing a remote Spark master URI and S3 URI to the local Mahout script. I would install and set up Mahout on your cluster master, make sure MAHOUT_LOCAL is not set there since you will be using a cluster, and execute the CLI from there.

Furthermore are you sure that the remote Spark cluster can see the S3 data? Ssh to the Spark master and do something like “hadoop fs -ls” or supply the URI to verify that the Hadoop config on the remote cluster, which is what the remote Spark will use, can get to the data.


On Sep 15, 2014, at 2:28 PM, Phil Wills <ot...@gmail.com> wrote:

The data and s3n file system is OK, since when I run 'locally' that's just
without a master specified, but otherwise identically, it works fine. I've
been using the spark-ec2 scripts to retrieve spark and hadoop, so had
assumed that meant they were operating compatible versions, but I'm not
specifying which hadoop to use explicitly, so I don't know if that has an
effect.

Phil

On Mon, Sep 15, 2014 at 7:25 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> It should handle this input—no surprise.
> 
> Spark must be compiled for the correct version of Hadoop that you are
> using (Mahout also). I’d make sure Spark is working properly with your HDFS
> by trying one of their examples if you haven’t already. Running locally may
> not be using the same version of Hadoop, have you checked that?
> 
> A filenamePattern of ‘.*’ will get all files in
> s3n://recommendation-logs/2014/09/06 and you have it set to search
> recursively. Check to make sure this is what you want. Did you use the same
> dir structure as you have on s3n when you ran locally? Since this driver
> looks at text files it can think it is working on data if it finds “[\t, ]”
> a tab, comma, or space in the line when it’s reading garbage so you should
> be sure it is working on only the files you want. Tell it to look for only
> a tab if that’s what you are using or use a regex to match the entire
> filename like “^part.*” or “.*log”.
> 
> I have not tested with s3n:// URIs. I assume you can read all these with
> the hadoop tools like “hadoop fs -ls s3n://recommendation-logs/2014/09/06”?
> 
> off-list I’ll send a link to epinions data formatted for Mahout. You can
> try putting that in HDFS via sn3 and running it because I have tested that
> on a cluster. It is all in one file though so if there is a problem in file
> discovery it won’t show up.
> 
> 
> On Sep 15, 2014, at 9:10 AM, Phil Wills <ot...@gmail.com> wrote:
> 
> Tried running locally on a reasonably beefy machine and it worked fine.
> Which is the toy data, you're referring to?
> 
> JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 SPARK_HOME=/root/spark
> MAHOUT_HOME=. bin/mahout spark-itemsimilarity --input
> s3n://recommendation-logs/2014/09/06 --output
> s3n://recommendation-outputs/2014/09/06 --filenamePattern '.*' --recursive
> --master spark://ec2-54-75-13-36.eu-west-1.compute.amazonaws.com:7077
> --sparkExecutorMem 6g
> 
> and the working version running locally on a beefier box:
> 
> JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 SPARK_HOME=/root/spark
> MAHOUT_HOME=. MAHOUT_HEAPSIZE=16000 bin/mahout spark-itemsimilarity --input
> s3n://ophan-recommendation-logs/2014/09/06 --output
> s3n://ophan-recommendation-outputs/2014/09/06 --filenamePattern '.*'
> --recursive  --sparkExecutorMem 16g
> 
> Sample input:
> 
> nnS1dIIBBtTnehVD79lgYeBw
> 
> http://www.example.com/world/2014/sep/05/malaysia-airlines-mh370-six-months-chinese-families-lack-answers
> 
> ikFSk14vHrTPqjSISvMihDUg
> 
> http://www.example.com/world/2014/sep/05/obama-core-coalition-10-countries-to-fight-isis
> 
> edqu8kfgsFSg2w3MhV5rUwuQ
> 
> http://www.example.com/lifeandstyle/wordofmouth/2014/sep/05/food-and-drink2?CMP=fb_gu
> 
> pfnmfONG1DQWG_EOOIxUASow
> 
> http://www.example.com/world/live/2014/sep/05/unresponsive-plane-f15-jets-aircraft-live-updates
> 
> pfUil_W0s2TZSqojMQrVcxVw        http://www.
> 
> example.com/football/blog/2014/sep/05/jose-mourinho-bargain-loic-remy-chelsea-france
> 
> nxTJnpyenFSP-tqWSLHQdW8w
> http://www.example.com/books/2014/sep/05/were-we-happier-in-the-stone-age
> 
> lba37jwJVQS5GbiSuus1i6tA
> 
> http://www.example.com/stage/2014/sep/05/titus-andronicus-review-visually-striking-but-flawed
> 
> bEHaOzZPbtQz-X2K1wortBQQ
> 
> http://www.example.com/cities/2014/sep/05/death-america-suburban-dream-ferguson-missouri-resegregation
> 
> gjTGzDXiDOT5W2SThhm0tUmg
> 
> http://www.example.com/world/2014/sep/05/man-jailed-phoning-texting-ex-21807-times
> 
> pfFbQ5ddvBRhm0XLZbN6Xd2A
> 
> http://www.example.com/sport/2014/sep/05/gloucester-northampton-premiership-rugby
> 
> 
> 
> On Sun, Sep 14, 2014 at 4:06 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
> 
>> I wonder if it’s trying to write an empty rdd to a text file. Can you
> give
>> the CLI options and a snippet of data?
>> 
>> Also have you successfully run this on the toy data in the resource dir?
>> There is a script to run it locally that you can adapt for running on a
>> cluster. This will eliminate any cluster problem.
>> 
>> 
>> On Sep 13, 2014, at 1:13 PM, Phil Wills <ot...@gmail.com> wrote:
>> 
>> Here's the master log from the line with the stack trace to termination:
>> 
>> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run
> saveAsTextFile
>> at TextDelimitedReaderWriter.scala:288
>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due
>> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID 448
>> on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
>> reason
>> Driver stacktrace:
>> at org.apache.spark.scheduler.DAGScheduler.org
>> 
>> 
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
>> at
>> 
>> 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
>> at
>> 
>> 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
>> at
>> 
>> 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> at
>> 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
>> at
>> 
>> 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>> at
>> 
>> 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
>> at scala.Option.foreach(Option.scala:236)
>> at
>> 
>> 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
>> at
>> 
>> 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>> at
>> 
>> 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> at
>> 
>> 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>> at
>> 
>> 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Executor lost: 8 (epoch
> 20)
>> 14/09/12 15:54:55 INFO storage.BlockManagerMasterActor: Trying to remove
>> executor 8 from BlockManagerMaster.
>> 14/09/12 15:54:55 INFO storage.BlockManagerMaster: Removed 8 successfully
>> in removeExecutor
>> 14/09/12 15:54:55 INFO storage.BlockManagerInfo: Registering block
> manager
>> ip-10-105-176-77.eu-west-1.compute.internal:58803 with 3.4 GB RAM
>> 14/09/12 15:54:55 INFO cluster.SparkDeploySchedulerBackend: Registered
>> executor:
>> Actor[akka.tcp://sparkExecutor@ip-10-90-1-56.eu-west-1.compute.internal
>> :56590/user/Executor#1456047585]
>> with ID 9
>> 
>> On Sat, Sep 13, 2014 at 4:21 PM, Pat Ferrel <pa...@gmail.com>
> wrote:
>> 
>>> It’s not an error I’ve seen but they can tend to be pretty cryptic.
> Could
>>> you post more of the stack trace?
>>> 
>>> On Sep 12, 2014, at 2:55 PM, Phil Wills <ot...@gmail.com> wrote:
>>> 
>>> I've tried on 1.0.1 and 1.0.2, updating the pom to 1.0.2 when running on
>>> that.  I used the spark-ec2 scripts to set up the cluster.
>>> 
>>> I might be able to share the data I'll mull it over the weekend to make
>>> sure there's nothing sensitive, or if there's a way I can transform it
> to
>>> that point.
>>> 
>>> Phil
>>> 
>>> 
>>> On Fri, Sep 12, 2014 at 6:30 PM, Pat Ferrel <pa...@occamsmachete.com>
>> wrote:
>>> 
>>>> The mahout pom says 1.0.1 but I’m running fine on 1.0.2
>>>> 
>>>> 
>>>> On Sep 12, 2014, at 10:08 AM, Pat Ferrel <pa...@occamsmachete.com>
> wrote:
>>>> 
>>>> Is it a mature Spark cluster, what version of Spark?
>>>> 
>>>> If you can share the data I can try it on mine.
>>>> 
>>>> On Sep 12, 2014, at 9:42 AM, Phil Wills <ot...@gmail.com> wrote:
>>>> 
>>>> I've been experimenting with the fairly new ItemSimilarityDriver, which
>>> is
>>>> working fine up until the point it tries to write out it's results.
>>>> Initially I was getting an issue with the akka frameSize being too
>> small,
>>>> but after expanding that I'm now getting a much more cryptic error:
>>>> 
>>>> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run
>>> saveAsTextFile
>>>> at TextDelimitedReaderWriter.scala:288
>>>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>>> due
>>>> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID
>> 448
>>>> on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
>>>> reason
>>>> 
>>>> This is from the master node, but there doesn't seem to be anything
> more
>>>> intelligible in the slave node logs.
>>>> 
>>>> I've tried writing to the local file system as well as s3n and can see
>>> it's
>>>> not an access problem, as I am seeing a zero length file appear.
>>>> 
>>>> Thanks for any pointers and apologies if this would be better to ask on
>>> the
>>>> Spark list,
>>>> 
>>>> Phil
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
>

Re: ItemSimilarityDriver failing to write text file

Posted by Phil Wills <ot...@gmail.com>.

The data and s3n file system is OK, since when I run 'locally' that's just
without a master specified, but otherwise identically, it works fine. I've
been using the spark-ec2 scripts to retrieve spark and hadoop, so had
assumed that meant they were operating compatible versions, but I'm not
specifying which hadoop to use explicitly, so I don't know if that has an
effect.

Phil

On Mon, Sep 15, 2014 at 7:25 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> It should handle this input—no surprise.
>
> Spark must be compiled for the correct version of Hadoop that you are
> using (Mahout also). I’d make sure Spark is working properly with your HDFS
> by trying one of their examples if you haven’t already. Running locally may
> not be using the same version of Hadoop, have you checked that?
>
> A filenamePattern of ‘.*’ will get all files in
> s3n://recommendation-logs/2014/09/06 and you have it set to search
> recursively. Check to make sure this is what you want. Did you use the same
> dir structure as you have on s3n when you ran locally? Since this driver
> looks at text files it can think it is working on data if it finds “[\t, ]”
> a tab, comma, or space in the line when it’s reading garbage so you should
> be sure it is working on only the files you want. Tell it to look for only
> a tab if that’s what you are using or use a regex to match the entire
> filename like “^part.*” or “.*log”.
>
> I have not tested with s3n:// URIs. I assume you can read all these with
> the hadoop tools like “hadoop fs -ls s3n://recommendation-logs/2014/09/06”?
>
> off-list I’ll send a link to epinions data formatted for Mahout. You can
> try putting that in HDFS via sn3 and running it because I have tested that
> on a cluster. It is all in one file though so if there is a problem in file
> discovery it won’t show up.
>
>
> On Sep 15, 2014, at 9:10 AM, Phil Wills <ot...@gmail.com> wrote:
>
> Tried running locally on a reasonably beefy machine and it worked fine.
> Which is the toy data, you're referring to?
>
> JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 SPARK_HOME=/root/spark
> MAHOUT_HOME=. bin/mahout spark-itemsimilarity --input
> s3n://recommendation-logs/2014/09/06 --output
> s3n://recommendation-outputs/2014/09/06 --filenamePattern '.*' --recursive
> --master spark://ec2-54-75-13-36.eu-west-1.compute.amazonaws.com:7077
> --sparkExecutorMem 6g
>
> and the working version running locally on a beefier box:
>
> JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 SPARK_HOME=/root/spark
> MAHOUT_HOME=. MAHOUT_HEAPSIZE=16000 bin/mahout spark-itemsimilarity --input
> s3n://ophan-recommendation-logs/2014/09/06 --output
> s3n://ophan-recommendation-outputs/2014/09/06 --filenamePattern '.*'
> --recursive  --sparkExecutorMem 16g
>
> Sample input:
>
> nnS1dIIBBtTnehVD79lgYeBw
>
> http://www.example.com/world/2014/sep/05/malaysia-airlines-mh370-six-months-chinese-families-lack-answers
>
> ikFSk14vHrTPqjSISvMihDUg
>
> http://www.example.com/world/2014/sep/05/obama-core-coalition-10-countries-to-fight-isis
>
> edqu8kfgsFSg2w3MhV5rUwuQ
>
> http://www.example.com/lifeandstyle/wordofmouth/2014/sep/05/food-and-drink2?CMP=fb_gu
>
> pfnmfONG1DQWG_EOOIxUASow
>
> http://www.example.com/world/live/2014/sep/05/unresponsive-plane-f15-jets-aircraft-live-updates
>
> pfUil_W0s2TZSqojMQrVcxVw        http://www.
>
> example.com/football/blog/2014/sep/05/jose-mourinho-bargain-loic-remy-chelsea-france
>
> nxTJnpyenFSP-tqWSLHQdW8w
> http://www.example.com/books/2014/sep/05/were-we-happier-in-the-stone-age
>
> lba37jwJVQS5GbiSuus1i6tA
>
> http://www.example.com/stage/2014/sep/05/titus-andronicus-review-visually-striking-but-flawed
>
> bEHaOzZPbtQz-X2K1wortBQQ
>
> http://www.example.com/cities/2014/sep/05/death-america-suburban-dream-ferguson-missouri-resegregation
>
> gjTGzDXiDOT5W2SThhm0tUmg
>
> http://www.example.com/world/2014/sep/05/man-jailed-phoning-texting-ex-21807-times
>
> pfFbQ5ddvBRhm0XLZbN6Xd2A
>
> http://www.example.com/sport/2014/sep/05/gloucester-northampton-premiership-rugby
>
>
>
> On Sun, Sep 14, 2014 at 4:06 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
> > I wonder if it’s trying to write an empty rdd to a text file. Can you
> give
> > the CLI options and a snippet of data?
> >
> > Also have you successfully run this on the toy data in the resource dir?
> > There is a script to run it locally that you can adapt for running on a
> > cluster. This will eliminate any cluster problem.
> >
> >
> > On Sep 13, 2014, at 1:13 PM, Phil Wills <ot...@gmail.com> wrote:
> >
> > Here's the master log from the line with the stack trace to termination:
> >
> > 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run
> saveAsTextFile
> > at TextDelimitedReaderWriter.scala:288
> > Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due
> > to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID 448
> > on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
> > reason
> > Driver stacktrace:
> > at org.apache.spark.scheduler.DAGScheduler.org
> >
> >
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
> > at
> >
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
> > at
> >
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
> > at
> >
> >
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> > at
> >
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
> > at
> >
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
> > at
> >
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
> > at scala.Option.foreach(Option.scala:236)
> > at
> >
> >
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
> > at
> >
> >
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
> > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> > at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> > at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> > at
> >
> >
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> > at
> >
> >
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> > at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> > at
> >
> >
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> > 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Executor lost: 8 (epoch
> 20)
> > 14/09/12 15:54:55 INFO storage.BlockManagerMasterActor: Trying to remove
> > executor 8 from BlockManagerMaster.
> > 14/09/12 15:54:55 INFO storage.BlockManagerMaster: Removed 8 successfully
> > in removeExecutor
> > 14/09/12 15:54:55 INFO storage.BlockManagerInfo: Registering block
> manager
> > ip-10-105-176-77.eu-west-1.compute.internal:58803 with 3.4 GB RAM
> > 14/09/12 15:54:55 INFO cluster.SparkDeploySchedulerBackend: Registered
> > executor:
> > Actor[akka.tcp://sparkExecutor@ip-10-90-1-56.eu-west-1.compute.internal
> > :56590/user/Executor#1456047585]
> > with ID 9
> >
> > On Sat, Sep 13, 2014 at 4:21 PM, Pat Ferrel <pa...@gmail.com>
> wrote:
> >
> >> It’s not an error I’ve seen but they can tend to be pretty cryptic.
> Could
> >> you post more of the stack trace?
> >>
> >> On Sep 12, 2014, at 2:55 PM, Phil Wills <ot...@gmail.com> wrote:
> >>
> >> I've tried on 1.0.1 and 1.0.2, updating the pom to 1.0.2 when running on
> >> that.  I used the spark-ec2 scripts to set up the cluster.
> >>
> >> I might be able to share the data I'll mull it over the weekend to make
> >> sure there's nothing sensitive, or if there's a way I can transform it
> to
> >> that point.
> >>
> >> Phil
> >>
> >>
> >> On Fri, Sep 12, 2014 at 6:30 PM, Pat Ferrel <pa...@occamsmachete.com>
> > wrote:
> >>
> >>> The mahout pom says 1.0.1 but I’m running fine on 1.0.2
> >>>
> >>>
> >>> On Sep 12, 2014, at 10:08 AM, Pat Ferrel <pa...@occamsmachete.com>
> wrote:
> >>>
> >>> Is it a mature Spark cluster, what version of Spark?
> >>>
> >>> If you can share the data I can try it on mine.
> >>>
> >>> On Sep 12, 2014, at 9:42 AM, Phil Wills <ot...@gmail.com> wrote:
> >>>
> >>> I've been experimenting with the fairly new ItemSimilarityDriver, which
> >> is
> >>> working fine up until the point it tries to write out it's results.
> >>> Initially I was getting an issue with the akka frameSize being too
> > small,
> >>> but after expanding that I'm now getting a much more cryptic error:
> >>>
> >>> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run
> >> saveAsTextFile
> >>> at TextDelimitedReaderWriter.scala:288
> >>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> >> due
> >>> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID
> > 448
> >>> on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
> >>> reason
> >>>
> >>> This is from the master node, but there doesn't seem to be anything
> more
> >>> intelligible in the slave node logs.
> >>>
> >>> I've tried writing to the local file system as well as s3n and can see
> >> it's
> >>> not an access problem, as I am seeing a zero length file appear.
> >>>
> >>> Thanks for any pointers and apologies if this would be better to ask on
> >> the
> >>> Spark list,
> >>>
> >>> Phil
> >>>
> >>>
> >>>
> >>
> >>
> >
> >
>
>

Re: ItemSimilarityDriver failing to write text file

Posted by Pat Ferrel <pa...@occamsmachete.com>.

It should handle this input—no surprise. 

Spark must be compiled for the correct version of Hadoop that you are using (Mahout also). I’d make sure Spark is working properly with your HDFS by trying one of their examples if you haven’t already. Running locally may not be using the same version of Hadoop, have you checked that?

A filenamePattern of ‘.*’ will get all files in s3n://recommendation-logs/2014/09/06 and you have it set to search recursively. Check to make sure this is what you want. Did you use the same dir structure as you have on s3n when you ran locally? Since this driver looks at text files it can think it is working on data if it finds “[\t, ]” a tab, comma, or space in the line when it’s reading garbage so you should be sure it is working on only the files you want. Tell it to look for only a tab if that’s what you are using or use a regex to match the entire filename like “^part.*” or “.*log”. 

I have not tested with s3n:// URIs. I assume you can read all these with the hadoop tools like “hadoop fs -ls s3n://recommendation-logs/2014/09/06”?

off-list I’ll send a link to epinions data formatted for Mahout. You can try putting that in HDFS via sn3 and running it because I have tested that on a cluster. It is all in one file though so if there is a problem in file discovery it won’t show up.


On Sep 15, 2014, at 9:10 AM, Phil Wills <ot...@gmail.com> wrote:

Tried running locally on a reasonably beefy machine and it worked fine.
Which is the toy data, you're referring to?

JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 SPARK_HOME=/root/spark
MAHOUT_HOME=. bin/mahout spark-itemsimilarity --input
s3n://recommendation-logs/2014/09/06 --output
s3n://recommendation-outputs/2014/09/06 --filenamePattern '.*' --recursive
--master spark://ec2-54-75-13-36.eu-west-1.compute.amazonaws.com:7077
--sparkExecutorMem 6g

and the working version running locally on a beefier box:

JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 SPARK_HOME=/root/spark
MAHOUT_HOME=. MAHOUT_HEAPSIZE=16000 bin/mahout spark-itemsimilarity --input
s3n://ophan-recommendation-logs/2014/09/06 --output
s3n://ophan-recommendation-outputs/2014/09/06 --filenamePattern '.*'
--recursive  --sparkExecutorMem 16g

Sample input:

nnS1dIIBBtTnehVD79lgYeBw
http://www.example.com/world/2014/sep/05/malaysia-airlines-mh370-six-months-chinese-families-lack-answers

ikFSk14vHrTPqjSISvMihDUg
http://www.example.com/world/2014/sep/05/obama-core-coalition-10-countries-to-fight-isis

edqu8kfgsFSg2w3MhV5rUwuQ
http://www.example.com/lifeandstyle/wordofmouth/2014/sep/05/food-and-drink2?CMP=fb_gu

pfnmfONG1DQWG_EOOIxUASow
http://www.example.com/world/live/2014/sep/05/unresponsive-plane-f15-jets-aircraft-live-updates

pfUil_W0s2TZSqojMQrVcxVw        http://www.
example.com/football/blog/2014/sep/05/jose-mourinho-bargain-loic-remy-chelsea-france

nxTJnpyenFSP-tqWSLHQdW8w
http://www.example.com/books/2014/sep/05/were-we-happier-in-the-stone-age

lba37jwJVQS5GbiSuus1i6tA
http://www.example.com/stage/2014/sep/05/titus-andronicus-review-visually-striking-but-flawed

bEHaOzZPbtQz-X2K1wortBQQ
http://www.example.com/cities/2014/sep/05/death-america-suburban-dream-ferguson-missouri-resegregation

gjTGzDXiDOT5W2SThhm0tUmg
http://www.example.com/world/2014/sep/05/man-jailed-phoning-texting-ex-21807-times

pfFbQ5ddvBRhm0XLZbN6Xd2A
http://www.example.com/sport/2014/sep/05/gloucester-northampton-premiership-rugby



On Sun, Sep 14, 2014 at 4:06 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> I wonder if it’s trying to write an empty rdd to a text file. Can you give
> the CLI options and a snippet of data?
> 
> Also have you successfully run this on the toy data in the resource dir?
> There is a script to run it locally that you can adapt for running on a
> cluster. This will eliminate any cluster problem.
> 
> 
> On Sep 13, 2014, at 1:13 PM, Phil Wills <ot...@gmail.com> wrote:
> 
> Here's the master log from the line with the stack trace to termination:
> 
> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run saveAsTextFile
> at TextDelimitedReaderWriter.scala:288
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due
> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID 448
> on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
> reason
> Driver stacktrace:
> at org.apache.spark.scheduler.DAGScheduler.org
> 
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
> at
> 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
> at
> 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
> at
> 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
> at
> 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
> at
> 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
> at scala.Option.foreach(Option.scala:236)
> at
> 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
> at
> 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at
> 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Executor lost: 8 (epoch 20)
> 14/09/12 15:54:55 INFO storage.BlockManagerMasterActor: Trying to remove
> executor 8 from BlockManagerMaster.
> 14/09/12 15:54:55 INFO storage.BlockManagerMaster: Removed 8 successfully
> in removeExecutor
> 14/09/12 15:54:55 INFO storage.BlockManagerInfo: Registering block manager
> ip-10-105-176-77.eu-west-1.compute.internal:58803 with 3.4 GB RAM
> 14/09/12 15:54:55 INFO cluster.SparkDeploySchedulerBackend: Registered
> executor:
> Actor[akka.tcp://sparkExecutor@ip-10-90-1-56.eu-west-1.compute.internal
> :56590/user/Executor#1456047585]
> with ID 9
> 
> On Sat, Sep 13, 2014 at 4:21 PM, Pat Ferrel <pa...@gmail.com> wrote:
> 
>> It’s not an error I’ve seen but they can tend to be pretty cryptic. Could
>> you post more of the stack trace?
>> 
>> On Sep 12, 2014, at 2:55 PM, Phil Wills <ot...@gmail.com> wrote:
>> 
>> I've tried on 1.0.1 and 1.0.2, updating the pom to 1.0.2 when running on
>> that.  I used the spark-ec2 scripts to set up the cluster.
>> 
>> I might be able to share the data I'll mull it over the weekend to make
>> sure there's nothing sensitive, or if there's a way I can transform it to
>> that point.
>> 
>> Phil
>> 
>> 
>> On Fri, Sep 12, 2014 at 6:30 PM, Pat Ferrel <pa...@occamsmachete.com>
> wrote:
>> 
>>> The mahout pom says 1.0.1 but I’m running fine on 1.0.2
>>> 
>>> 
>>> On Sep 12, 2014, at 10:08 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>>> 
>>> Is it a mature Spark cluster, what version of Spark?
>>> 
>>> If you can share the data I can try it on mine.
>>> 
>>> On Sep 12, 2014, at 9:42 AM, Phil Wills <ot...@gmail.com> wrote:
>>> 
>>> I've been experimenting with the fairly new ItemSimilarityDriver, which
>> is
>>> working fine up until the point it tries to write out it's results.
>>> Initially I was getting an issue with the akka frameSize being too
> small,
>>> but after expanding that I'm now getting a much more cryptic error:
>>> 
>>> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run
>> saveAsTextFile
>>> at TextDelimitedReaderWriter.scala:288
>>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>> due
>>> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID
> 448
>>> on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
>>> reason
>>> 
>>> This is from the master node, but there doesn't seem to be anything more
>>> intelligible in the slave node logs.
>>> 
>>> I've tried writing to the local file system as well as s3n and can see
>> it's
>>> not an access problem, as I am seeing a zero length file appear.
>>> 
>>> Thanks for any pointers and apologies if this would be better to ask on
>> the
>>> Spark list,
>>> 
>>> Phil
>>> 
>>> 
>>> 
>> 
>> 
> 
>

Re: ItemSimilarityDriver failing to write text file

Posted by Phil Wills <ot...@gmail.com>.

Tried running locally on a reasonably beefy machine and it worked fine.
 Which is the toy data, you're referring to?

JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 SPARK_HOME=/root/spark
MAHOUT_HOME=. bin/mahout spark-itemsimilarity --input
s3n://recommendation-logs/2014/09/06 --output
s3n://recommendation-outputs/2014/09/06 --filenamePattern '.*' --recursive
--master spark://ec2-54-75-13-36.eu-west-1.compute.amazonaws.com:7077
--sparkExecutorMem 6g

and the working version running locally on a beefier box:

JAVA_HOME=/usr/lib/jvm/jre-1.7.0-openjdk.x86_64 SPARK_HOME=/root/spark
MAHOUT_HOME=. MAHOUT_HEAPSIZE=16000 bin/mahout spark-itemsimilarity --input
s3n://ophan-recommendation-logs/2014/09/06 --output
s3n://ophan-recommendation-outputs/2014/09/06 --filenamePattern '.*'
--recursive  --sparkExecutorMem 16g

Sample input:

nnS1dIIBBtTnehVD79lgYeBw
http://www.example.com/world/2014/sep/05/malaysia-airlines-mh370-six-months-chinese-families-lack-answers

ikFSk14vHrTPqjSISvMihDUg
http://www.example.com/world/2014/sep/05/obama-core-coalition-10-countries-to-fight-isis

edqu8kfgsFSg2w3MhV5rUwuQ
http://www.example.com/lifeandstyle/wordofmouth/2014/sep/05/food-and-drink2?CMP=fb_gu

pfnmfONG1DQWG_EOOIxUASow
http://www.example.com/world/live/2014/sep/05/unresponsive-plane-f15-jets-aircraft-live-updates

pfUil_W0s2TZSqojMQrVcxVw        http://www.
example.com/football/blog/2014/sep/05/jose-mourinho-bargain-loic-remy-chelsea-france

nxTJnpyenFSP-tqWSLHQdW8w
http://www.example.com/books/2014/sep/05/were-we-happier-in-the-stone-age

lba37jwJVQS5GbiSuus1i6tA
http://www.example.com/stage/2014/sep/05/titus-andronicus-review-visually-striking-but-flawed

bEHaOzZPbtQz-X2K1wortBQQ
http://www.example.com/cities/2014/sep/05/death-america-suburban-dream-ferguson-missouri-resegregation

gjTGzDXiDOT5W2SThhm0tUmg
http://www.example.com/world/2014/sep/05/man-jailed-phoning-texting-ex-21807-times

pfFbQ5ddvBRhm0XLZbN6Xd2A
http://www.example.com/sport/2014/sep/05/gloucester-northampton-premiership-rugby



On Sun, Sep 14, 2014 at 4:06 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> I wonder if it’s trying to write an empty rdd to a text file. Can you give
> the CLI options and a snippet of data?
>
> Also have you successfully run this on the toy data in the resource dir?
> There is a script to run it locally that you can adapt for running on a
> cluster. This will eliminate any cluster problem.
>
>
> On Sep 13, 2014, at 1:13 PM, Phil Wills <ot...@gmail.com> wrote:
>
> Here's the master log from the line with the stack trace to termination:
>
> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run saveAsTextFile
> at TextDelimitedReaderWriter.scala:288
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due
> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID 448
> on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
> reason
> Driver stacktrace:
> at org.apache.spark.scheduler.DAGScheduler.org
>
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
> at
>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
> at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
> at scala.Option.foreach(Option.scala:236)
> at
>
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
> at
>
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at
>
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Executor lost: 8 (epoch 20)
> 14/09/12 15:54:55 INFO storage.BlockManagerMasterActor: Trying to remove
> executor 8 from BlockManagerMaster.
> 14/09/12 15:54:55 INFO storage.BlockManagerMaster: Removed 8 successfully
> in removeExecutor
> 14/09/12 15:54:55 INFO storage.BlockManagerInfo: Registering block manager
> ip-10-105-176-77.eu-west-1.compute.internal:58803 with 3.4 GB RAM
> 14/09/12 15:54:55 INFO cluster.SparkDeploySchedulerBackend: Registered
> executor:
> Actor[akka.tcp://sparkExecutor@ip-10-90-1-56.eu-west-1.compute.internal
> :56590/user/Executor#1456047585]
> with ID 9
>
> On Sat, Sep 13, 2014 at 4:21 PM, Pat Ferrel <pa...@gmail.com> wrote:
>
> > It’s not an error I’ve seen but they can tend to be pretty cryptic. Could
> > you post more of the stack trace?
> >
> > On Sep 12, 2014, at 2:55 PM, Phil Wills <ot...@gmail.com> wrote:
> >
> > I've tried on 1.0.1 and 1.0.2, updating the pom to 1.0.2 when running on
> > that.  I used the spark-ec2 scripts to set up the cluster.
> >
> > I might be able to share the data I'll mull it over the weekend to make
> > sure there's nothing sensitive, or if there's a way I can transform it to
> > that point.
> >
> > Phil
> >
> >
> > On Fri, Sep 12, 2014 at 6:30 PM, Pat Ferrel <pa...@occamsmachete.com>
> wrote:
> >
> >> The mahout pom says 1.0.1 but I’m running fine on 1.0.2
> >>
> >>
> >> On Sep 12, 2014, at 10:08 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
> >>
> >> Is it a mature Spark cluster, what version of Spark?
> >>
> >> If you can share the data I can try it on mine.
> >>
> >> On Sep 12, 2014, at 9:42 AM, Phil Wills <ot...@gmail.com> wrote:
> >>
> >> I've been experimenting with the fairly new ItemSimilarityDriver, which
> > is
> >> working fine up until the point it tries to write out it's results.
> >> Initially I was getting an issue with the akka frameSize being too
> small,
> >> but after expanding that I'm now getting a much more cryptic error:
> >>
> >> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run
> > saveAsTextFile
> >> at TextDelimitedReaderWriter.scala:288
> >> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> > due
> >> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID
> 448
> >> on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
> >> reason
> >>
> >> This is from the master node, but there doesn't seem to be anything more
> >> intelligible in the slave node logs.
> >>
> >> I've tried writing to the local file system as well as s3n and can see
> > it's
> >> not an access problem, as I am seeing a zero length file appear.
> >>
> >> Thanks for any pointers and apologies if this would be better to ask on
> > the
> >> Spark list,
> >>
> >> Phil
> >>
> >>
> >>
> >
> >
>
>

Re: ItemSimilarityDriver failing to write text file

Posted by Pat Ferrel <pa...@occamsmachete.com>.

I wonder if it’s trying to write an empty rdd to a text file. Can you give the CLI options and a snippet of data?

Also have you successfully run this on the toy data in the resource dir? There is a script to run it locally that you can adapt for running on a cluster. This will eliminate any cluster problem.

On Sep 13, 2014, at 1:13 PM, Phil Wills <ot...@gmail.com> wrote:

Here's the master log from the line with the stack trace to termination:

14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run saveAsTextFile
at TextDelimitedReaderWriter.scala:288
Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID 448
on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
reason
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
14/09/12 15:54:55 INFO scheduler.DAGScheduler: Executor lost: 8 (epoch 20)
14/09/12 15:54:55 INFO storage.BlockManagerMasterActor: Trying to remove
executor 8 from BlockManagerMaster.
14/09/12 15:54:55 INFO storage.BlockManagerMaster: Removed 8 successfully
in removeExecutor
14/09/12 15:54:55 INFO storage.BlockManagerInfo: Registering block manager
ip-10-105-176-77.eu-west-1.compute.internal:58803 with 3.4 GB RAM
14/09/12 15:54:55 INFO cluster.SparkDeploySchedulerBackend: Registered
executor:
Actor[akka.tcp://sparkExecutor@ip-10-90-1-56.eu-west-1.compute.internal:56590/user/Executor#1456047585]
with ID 9

On Sat, Sep 13, 2014 at 4:21 PM, Pat Ferrel <pa...@gmail.com> wrote:

> It’s not an error I’ve seen but they can tend to be pretty cryptic. Could
> you post more of the stack trace?
> 
> On Sep 12, 2014, at 2:55 PM, Phil Wills <ot...@gmail.com> wrote:
> 
> I've tried on 1.0.1 and 1.0.2, updating the pom to 1.0.2 when running on
> that.  I used the spark-ec2 scripts to set up the cluster.
> 
> I might be able to share the data I'll mull it over the weekend to make
> sure there's nothing sensitive, or if there's a way I can transform it to
> that point.
> 
> Phil
> 
> 
> On Fri, Sep 12, 2014 at 6:30 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
> 
>> The mahout pom says 1.0.1 but I’m running fine on 1.0.2
>> 
>> 
>> On Sep 12, 2014, at 10:08 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>> 
>> Is it a mature Spark cluster, what version of Spark?
>> 
>> If you can share the data I can try it on mine.
>> 
>> On Sep 12, 2014, at 9:42 AM, Phil Wills <ot...@gmail.com> wrote:
>> 
>> I've been experimenting with the fairly new ItemSimilarityDriver, which
> is
>> working fine up until the point it tries to write out it's results.
>> Initially I was getting an issue with the akka frameSize being too small,
>> but after expanding that I'm now getting a much more cryptic error:
>> 
>> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run
> saveAsTextFile
>> at TextDelimitedReaderWriter.scala:288
>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due
>> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID 448
>> on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
>> reason
>> 
>> This is from the master node, but there doesn't seem to be anything more
>> intelligible in the slave node logs.
>> 
>> I've tried writing to the local file system as well as s3n and can see
> it's
>> not an access problem, as I am seeing a zero length file appear.
>> 
>> Thanks for any pointers and apologies if this would be better to ask on
> the
>> Spark list,
>> 
>> Phil
>> 
>> 
>> 
> 
>

Re: ItemSimilarityDriver failing to write text file

Posted by Phil Wills <ot...@gmail.com>.

Here's the master log from the line with the stack trace to termination:

14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run saveAsTextFile
at TextDelimitedReaderWriter.scala:288
Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID 448
on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
reason
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
14/09/12 15:54:55 INFO scheduler.DAGScheduler: Executor lost: 8 (epoch 20)
14/09/12 15:54:55 INFO storage.BlockManagerMasterActor: Trying to remove
executor 8 from BlockManagerMaster.
14/09/12 15:54:55 INFO storage.BlockManagerMaster: Removed 8 successfully
in removeExecutor
14/09/12 15:54:55 INFO storage.BlockManagerInfo: Registering block manager
ip-10-105-176-77.eu-west-1.compute.internal:58803 with 3.4 GB RAM
14/09/12 15:54:55 INFO cluster.SparkDeploySchedulerBackend: Registered
executor:
Actor[akka.tcp://sparkExecutor@ip-10-90-1-56.eu-west-1.compute.internal:56590/user/Executor#1456047585]
with ID 9

On Sat, Sep 13, 2014 at 4:21 PM, Pat Ferrel <pa...@gmail.com> wrote:

> It’s not an error I’ve seen but they can tend to be pretty cryptic. Could
> you post more of the stack trace?
>
> On Sep 12, 2014, at 2:55 PM, Phil Wills <ot...@gmail.com> wrote:
>
> I've tried on 1.0.1 and 1.0.2, updating the pom to 1.0.2 when running on
> that.  I used the spark-ec2 scripts to set up the cluster.
>
> I might be able to share the data I'll mull it over the weekend to make
> sure there's nothing sensitive, or if there's a way I can transform it to
> that point.
>
> Phil
>
>
> On Fri, Sep 12, 2014 at 6:30 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
> > The mahout pom says 1.0.1 but I’m running fine on 1.0.2
> >
> >
> > On Sep 12, 2014, at 10:08 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
> >
> > Is it a mature Spark cluster, what version of Spark?
> >
> > If you can share the data I can try it on mine.
> >
> > On Sep 12, 2014, at 9:42 AM, Phil Wills <ot...@gmail.com> wrote:
> >
> > I've been experimenting with the fairly new ItemSimilarityDriver, which
> is
> > working fine up until the point it tries to write out it's results.
> > Initially I was getting an issue with the akka frameSize being too small,
> > but after expanding that I'm now getting a much more cryptic error:
> >
> > 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run
> saveAsTextFile
> > at TextDelimitedReaderWriter.scala:288
> > Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due
> > to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID 448
> > on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
> > reason
> >
> > This is from the master node, but there doesn't seem to be anything more
> > intelligible in the slave node logs.
> >
> > I've tried writing to the local file system as well as s3n and can see
> it's
> > not an access problem, as I am seeing a zero length file appear.
> >
> > Thanks for any pointers and apologies if this would be better to ask on
> the
> > Spark list,
> >
> > Phil
> >
> >
> >
>
>

Re: ItemSimilarityDriver failing to write text file

Posted by Pat Ferrel <pa...@gmail.com>.

It’s not an error I’ve seen but they can tend to be pretty cryptic. Could you post more of the stack trace?

On Sep 12, 2014, at 2:55 PM, Phil Wills <ot...@gmail.com> wrote:

I've tried on 1.0.1 and 1.0.2, updating the pom to 1.0.2 when running on
that.  I used the spark-ec2 scripts to set up the cluster.

I might be able to share the data I'll mull it over the weekend to make
sure there's nothing sensitive, or if there's a way I can transform it to
that point.

Phil


On Fri, Sep 12, 2014 at 6:30 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> The mahout pom says 1.0.1 but I’m running fine on 1.0.2
> 
> 
> On Sep 12, 2014, at 10:08 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
> 
> Is it a mature Spark cluster, what version of Spark?
> 
> If you can share the data I can try it on mine.
> 
> On Sep 12, 2014, at 9:42 AM, Phil Wills <ot...@gmail.com> wrote:
> 
> I've been experimenting with the fairly new ItemSimilarityDriver, which is
> working fine up until the point it tries to write out it's results.
> Initially I was getting an issue with the akka frameSize being too small,
> but after expanding that I'm now getting a much more cryptic error:
> 
> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run saveAsTextFile
> at TextDelimitedReaderWriter.scala:288
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due
> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID 448
> on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
> reason
> 
> This is from the master node, but there doesn't seem to be anything more
> intelligible in the slave node logs.
> 
> I've tried writing to the local file system as well as s3n and can see it's
> not an access problem, as I am seeing a zero length file appear.
> 
> Thanks for any pointers and apologies if this would be better to ask on the
> Spark list,
> 
> Phil
> 
> 
>

Re: ItemSimilarityDriver failing to write text file

Posted by Phil Wills <ot...@gmail.com>.

I've tried on 1.0.1 and 1.0.2, updating the pom to 1.0.2 when running on
that.  I used the spark-ec2 scripts to set up the cluster.

I might be able to share the data I'll mull it over the weekend to make
sure there's nothing sensitive, or if there's a way I can transform it to
that point.

Phil


On Fri, Sep 12, 2014 at 6:30 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> The mahout pom says 1.0.1 but I’m running fine on 1.0.2
>
>
> On Sep 12, 2014, at 10:08 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
> Is it a mature Spark cluster, what version of Spark?
>
> If you can share the data I can try it on mine.
>
> On Sep 12, 2014, at 9:42 AM, Phil Wills <ot...@gmail.com> wrote:
>
> I've been experimenting with the fairly new ItemSimilarityDriver, which is
> working fine up until the point it tries to write out it's results.
> Initially I was getting an issue with the akka frameSize being too small,
> but after expanding that I'm now getting a much more cryptic error:
>
> 14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run saveAsTextFile
> at TextDelimitedReaderWriter.scala:288
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due
> to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID 448
> on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
> reason
>
> This is from the master node, but there doesn't seem to be anything more
> intelligible in the slave node logs.
>
> I've tried writing to the local file system as well as s3n and can see it's
> not an access problem, as I am seeing a zero length file appear.
>
> Thanks for any pointers and apologies if this would be better to ask on the
> Spark list,
>
> Phil
>
>
>

Re: ItemSimilarityDriver failing to write text file

Posted by Pat Ferrel <pa...@occamsmachete.com>.

The mahout pom says 1.0.1 but I’m running fine on 1.0.2


On Sep 12, 2014, at 10:08 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:

Is it a mature Spark cluster, what version of Spark?

If you can share the data I can try it on mine.

On Sep 12, 2014, at 9:42 AM, Phil Wills <ot...@gmail.com> wrote:

I've been experimenting with the fairly new ItemSimilarityDriver, which is
working fine up until the point it tries to write out it's results.
Initially I was getting an issue with the akka frameSize being too small,
but after expanding that I'm now getting a much more cryptic error:

14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run saveAsTextFile
at TextDelimitedReaderWriter.scala:288
Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID 448
on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
reason

This is from the master node, but there doesn't seem to be anything more
intelligible in the slave node logs.

I've tried writing to the local file system as well as s3n and can see it's
not an access problem, as I am seeing a zero length file appear.

Thanks for any pointers and apologies if this would be better to ask on the
Spark list,

Phil

Re: ItemSimilarityDriver failing to write text file

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Is it a mature Spark cluster, what version of Spark?

If you can share the data I can try it on mine.

On Sep 12, 2014, at 9:42 AM, Phil Wills <ot...@gmail.com> wrote:

I've been experimenting with the fairly new ItemSimilarityDriver, which is
working fine up until the point it tries to write out it's results.
Initially I was getting an issue with the akka frameSize being too small,
but after expanding that I'm now getting a much more cryptic error:

14/09/12 15:54:55 INFO scheduler.DAGScheduler: Failed to run saveAsTextFile
at TextDelimitedReaderWriter.scala:288
Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 8.0:3 failed 4 times, most recent failure: TID 448
on host ip-10-105-176-77.eu-west-1.compute.internal failed for unknown
reason

This is from the master node, but there doesn't seem to be anything more
intelligible in the slave node logs.

I've tried writing to the local file system as well as s3n and can see it's
not an access problem, as I am seeing a zero length file appear.

Thanks for any pointers and apologies if this would be better to ask on the
Spark list,

Phil