You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by lastarsenal <la...@163.com> on 2015/05/22 08:50:28 UTC

mahout spark FilteredRDD problem

Hi, 


   Recently I tried mahout spark, for example: 


./bin/mahout spark-itemsimilarity -i ${input} -o ${output} --master $MyMaster --sparkExecutorMem 2g


then I met a error like "Caused by: java.lang.ClassCastException: org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator cannot be cast to org.apache.spark.serializer.KryoRegistrator", It seems that our spark version is NOT compatible with mahout. 


For the spark system is deployed by ops, so what I can do just follow them. Then I do something as below:
1.  Modify  <spark.version>1.1.1</spark.version>  to  <spark.version>1.3.0</spark.version> , which is our spark version,   in pom.xml in mahout.
2. Run mvn -DskipTests clean install
3. Get a error when build: 


[ERROR] spark/src/main/scala/org/apache/mahout/sparkbindings/drm/CheckpointedDrmSpark.scala:168: error: value saveAsSequenceFile is not a member of org.apache.mahout.sparkbindings.DrmRdd[K]
[ERROR]     rdd.saveAsSequenceFile(path)
[ERROR]         ^
[ERROR]spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala:26: error: object FilteredRDD is not a member of package org.apache.spark.rdd
[ERROR] import org.apache.spark.rdd.{FilteredRDD, RDD}


4. Check 1.3.0 spark, FilteredRDD is dismissed
5. Check 1.1.1 spark, FilteredRDD is available.


So, my question is how can I solve it?


The error details is as below when I run ./bin/mahout spark-itemsimilarity -i ${input} -o ${output} --master $MyMaster --sparkExecutorMem 2g
15/05/22 12:22:27 WARN TaskSetManager: Lost task 8.0 in stage 0.0 (TID 8, 182.118.21.30): java.io.IOException: org.apache.spark.SparkException: Failed to register classes with Kryo
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1008)
        at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
        at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
        at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
        at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
        at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:61)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:56)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:195)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.spark.SparkException: Failed to register classes with Kryo
        at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:105)
        at org.apache.spark.serializer.KryoSerializerInstance.<init>(KryoSerializer.scala:157)
        at org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:119)
        at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:214)
        at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1005)
        ... 12 more
Caused by: java.lang.ClassCastException: org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator cannot be cast to org.apache.spark.serializer.KryoRegistrator
        at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$3.apply(KryoSerializer.scala:101)
        at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$3.apply(KryoSerializer.scala:101)
        at scala.Option.map(Option.scala:145)
        at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:101)
        ... 17 more


Re:Re: mahout spark FilteredRDD problem

Posted by lastarsenal <la...@163.com>.
Hi, pat,
    Thanks, I updated mahout master and ran: /bin/mahout spark-itemsimilarity -i ${input} -o ${output} --master $MyMaster --sparkExecutorMem 2g
    The problem dismissed.


    Howevever, another new problem comes:


15/06/05 11:37:46 WARN AkkaUtils: Error sending message in 1 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
	at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
	at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
	at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
	at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
	at scala.concurrent.Await$.result(package.scala:107)
	at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:194)
	at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:399)

    Could somebody give some advice? Very appreciated!






在 2015-05-26 02:00:08,"Pat Ferrel" <pa...@occamsmachete.com> 写道:
>Yes, the master was updated last week. You can check the “commits” tab on github to see the change history. This puts the master a couple of commits ahead of the release. The master branch reflects the next release’s work in progress snapshot and currently produces jars as …0.10.1-SNAPSHOT. Remember not to build for Spark 1.3, leave the POMs as-is. This will build for 1.2.2 but should run, as far as my testing has shown, on 1.3
>
>In the next week or so there will be a branch that will build and run on Spark 1.3 but will lack the mahout Scala shell for awhile.
>
>
>On May 22, 2015, at 8:37 PM, lastarsenal <la...@163.com> wrote:
>
>this problem is from git clone mahout master, not from mahout 0.10. maybe mahout master is updated recently? i will update and try again, thank you!
>
>发自我的 iPhone
>
>> 在 2015年5月22日,下午9:20,Pat Ferrel <pa...@occamsmachete.com> 写道:
>> 
>> Mahout 0.10.0 runs on Spark 1.1.1 or below _only_
>> 
>> If you are only using spark-itemsimilarity you can try the unreleased master, which is being moved to Spark 1.2.2, and is binary compatible with Spark 1.3. Get the latest master branch from https://github.com/apache/mahout and build from source. Leave the version at Spark 1.2.2. Changing to Spark 1.3 in the pom will cause compile errors like the ones below.
>> 
>> I’ve run a few tests on Spark 1.3 but if you’d like to try also please report back what you find.
>> 
>> BTW the use of guava has been removed so any source examples are still being updated. If you are using the command line you should be fine,
>> 
>> 
>> On May 21, 2015, at 11:50 PM, lastarsenal <la...@163.com> wrote:
>> 
>> Hi, 
>> 
>> 
>> Recently I tried mahout spark, for example: 
>> 
>> 
>> ./bin/mahout spark-itemsimilarity -i ${input} -o ${output} --master $MyMaster --sparkExecutorMem 2g
>> 
>> 
>> then I met a error like "Caused by: java.lang.ClassCastException: org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator cannot be cast to org.apache.spark.serializer.KryoRegistrator", It seems that our spark version is NOT compatible with mahout. 
>> 
>> 
>> For the spark system is deployed by ops, so what I can do just follow them. Then I do something as below:
>> 1.  Modify  <spark.version>1.1.1</spark.version>  to  <spark.version>1.3.0</spark.version> , which is our spark version,   in pom.xml in mahout.
>> 2. Run mvn -DskipTests clean install
>> 3. Get a error when build: 
>> 
>> 
>> [ERROR] spark/src/main/scala/org/apache/mahout/sparkbindings/drm/CheckpointedDrmSpark.scala:168: error: value saveAsSequenceFile is not a member of org.apache.mahout.sparkbindings.DrmRdd[K]
>> [ERROR]     rdd.saveAsSequenceFile(path)
>> [ERROR]         ^
>> [ERROR]spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala:26: error: object FilteredRDD is not a member of package org.apache.spark.rdd
>> [ERROR] import org.apache.spark.rdd.{FilteredRDD, RDD}
>> 
>> 
>> 4. Check 1.3.0 spark, FilteredRDD is dismissed
>> 5. Check 1.1.1 spark, FilteredRDD is available.
>> 
>> 
>> So, my question is how can I solve it?
>> 
>> 
>> The error details is as below when I run ./bin/mahout spark-itemsimilarity -i ${input} -o ${output} --master $MyMaster --sparkExecutorMem 2g
>> 15/05/22 12:22:27 WARN TaskSetManager: Lost task 8.0 in stage 0.0 (TID 8, 182.118.21.30): java.io.IOException: org.apache.spark.SparkException: Failed to register classes with Kryo
>>      at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1008)
>>      at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
>>      at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
>>      at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
>>      at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
>>      at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
>>      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:61)
>>      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>      at org.apache.spark.scheduler.Task.run(Task.scala:56)
>>      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:195)
>>      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>      at java.lang.Thread.run(Thread.java:722)
>> Caused by: org.apache.spark.SparkException: Failed to register classes with Kryo
>>      at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:105)
>>      at org.apache.spark.serializer.KryoSerializerInstance.<init>(KryoSerializer.scala:157)
>>      at org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:119)
>>      at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:214)
>>      at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
>>      at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1005)
>>      ... 12 more
>> Caused by: java.lang.ClassCastException: org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator cannot be cast to org.apache.spark.serializer.KryoRegistrator
>>      at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$3.apply(KryoSerializer.scala:101)
>>      at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$3.apply(KryoSerializer.scala:101)
>>      at scala.Option.map(Option.scala:145)
>>      at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:101)
>>      ... 17 more
>> 
>> 
>
>

Re: mahout spark FilteredRDD problem

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Yes, the master was updated last week. You can check the “commits” tab on github to see the change history. This puts the master a couple of commits ahead of the release. The master branch reflects the next release’s work in progress snapshot and currently produces jars as …0.10.1-SNAPSHOT. Remember not to build for Spark 1.3, leave the POMs as-is. This will build for 1.2.2 but should run, as far as my testing has shown, on 1.3

In the next week or so there will be a branch that will build and run on Spark 1.3 but will lack the mahout Scala shell for awhile.


On May 22, 2015, at 8:37 PM, lastarsenal <la...@163.com> wrote:

this problem is from git clone mahout master, not from mahout 0.10. maybe mahout master is updated recently? i will update and try again, thank you!

发自我的 iPhone

> 在 2015年5月22日,下午9:20,Pat Ferrel <pa...@occamsmachete.com> 写道:
> 
> Mahout 0.10.0 runs on Spark 1.1.1 or below _only_
> 
> If you are only using spark-itemsimilarity you can try the unreleased master, which is being moved to Spark 1.2.2, and is binary compatible with Spark 1.3. Get the latest master branch from https://github.com/apache/mahout and build from source. Leave the version at Spark 1.2.2. Changing to Spark 1.3 in the pom will cause compile errors like the ones below.
> 
> I’ve run a few tests on Spark 1.3 but if you’d like to try also please report back what you find.
> 
> BTW the use of guava has been removed so any source examples are still being updated. If you are using the command line you should be fine,
> 
> 
> On May 21, 2015, at 11:50 PM, lastarsenal <la...@163.com> wrote:
> 
> Hi, 
> 
> 
> Recently I tried mahout spark, for example: 
> 
> 
> ./bin/mahout spark-itemsimilarity -i ${input} -o ${output} --master $MyMaster --sparkExecutorMem 2g
> 
> 
> then I met a error like "Caused by: java.lang.ClassCastException: org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator cannot be cast to org.apache.spark.serializer.KryoRegistrator", It seems that our spark version is NOT compatible with mahout. 
> 
> 
> For the spark system is deployed by ops, so what I can do just follow them. Then I do something as below:
> 1.  Modify  <spark.version>1.1.1</spark.version>  to  <spark.version>1.3.0</spark.version> , which is our spark version,   in pom.xml in mahout.
> 2. Run mvn -DskipTests clean install
> 3. Get a error when build: 
> 
> 
> [ERROR] spark/src/main/scala/org/apache/mahout/sparkbindings/drm/CheckpointedDrmSpark.scala:168: error: value saveAsSequenceFile is not a member of org.apache.mahout.sparkbindings.DrmRdd[K]
> [ERROR]     rdd.saveAsSequenceFile(path)
> [ERROR]         ^
> [ERROR]spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala:26: error: object FilteredRDD is not a member of package org.apache.spark.rdd
> [ERROR] import org.apache.spark.rdd.{FilteredRDD, RDD}
> 
> 
> 4. Check 1.3.0 spark, FilteredRDD is dismissed
> 5. Check 1.1.1 spark, FilteredRDD is available.
> 
> 
> So, my question is how can I solve it?
> 
> 
> The error details is as below when I run ./bin/mahout spark-itemsimilarity -i ${input} -o ${output} --master $MyMaster --sparkExecutorMem 2g
> 15/05/22 12:22:27 WARN TaskSetManager: Lost task 8.0 in stage 0.0 (TID 8, 182.118.21.30): java.io.IOException: org.apache.spark.SparkException: Failed to register classes with Kryo
>      at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1008)
>      at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
>      at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
>      at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
>      at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
>      at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
>      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:61)
>      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>      at org.apache.spark.scheduler.Task.run(Task.scala:56)
>      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:195)
>      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>      at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.spark.SparkException: Failed to register classes with Kryo
>      at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:105)
>      at org.apache.spark.serializer.KryoSerializerInstance.<init>(KryoSerializer.scala:157)
>      at org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:119)
>      at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:214)
>      at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
>      at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1005)
>      ... 12 more
> Caused by: java.lang.ClassCastException: org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator cannot be cast to org.apache.spark.serializer.KryoRegistrator
>      at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$3.apply(KryoSerializer.scala:101)
>      at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$3.apply(KryoSerializer.scala:101)
>      at scala.Option.map(Option.scala:145)
>      at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:101)
>      ... 17 more
> 
> 



Re: mahout spark FilteredRDD problem

Posted by lastarsenal <la...@163.com>.
this problem is from git clone mahout master, not from mahout 0.10. maybe mahout master is updated recently? i will update and try again, thank you!

发自我的 iPhone

> 在 2015年5月22日,下午9:20,Pat Ferrel <pa...@occamsmachete.com> 写道:
> 
> Mahout 0.10.0 runs on Spark 1.1.1 or below _only_
> 
> If you are only using spark-itemsimilarity you can try the unreleased master, which is being moved to Spark 1.2.2, and is binary compatible with Spark 1.3. Get the latest master branch from https://github.com/apache/mahout and build from source. Leave the version at Spark 1.2.2. Changing to Spark 1.3 in the pom will cause compile errors like the ones below.
> 
> I’ve run a few tests on Spark 1.3 but if you’d like to try also please report back what you find.
> 
> BTW the use of guava has been removed so any source examples are still being updated. If you are using the command line you should be fine,
> 
> 
> On May 21, 2015, at 11:50 PM, lastarsenal <la...@163.com> wrote:
> 
> Hi, 
> 
> 
>  Recently I tried mahout spark, for example: 
> 
> 
> ./bin/mahout spark-itemsimilarity -i ${input} -o ${output} --master $MyMaster --sparkExecutorMem 2g
> 
> 
> then I met a error like "Caused by: java.lang.ClassCastException: org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator cannot be cast to org.apache.spark.serializer.KryoRegistrator", It seems that our spark version is NOT compatible with mahout. 
> 
> 
> For the spark system is deployed by ops, so what I can do just follow them. Then I do something as below:
> 1.  Modify  <spark.version>1.1.1</spark.version>  to  <spark.version>1.3.0</spark.version> , which is our spark version,   in pom.xml in mahout.
> 2. Run mvn -DskipTests clean install
> 3. Get a error when build: 
> 
> 
> [ERROR] spark/src/main/scala/org/apache/mahout/sparkbindings/drm/CheckpointedDrmSpark.scala:168: error: value saveAsSequenceFile is not a member of org.apache.mahout.sparkbindings.DrmRdd[K]
> [ERROR]     rdd.saveAsSequenceFile(path)
> [ERROR]         ^
> [ERROR]spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala:26: error: object FilteredRDD is not a member of package org.apache.spark.rdd
> [ERROR] import org.apache.spark.rdd.{FilteredRDD, RDD}
> 
> 
> 4. Check 1.3.0 spark, FilteredRDD is dismissed
> 5. Check 1.1.1 spark, FilteredRDD is available.
> 
> 
> So, my question is how can I solve it?
> 
> 
> The error details is as below when I run ./bin/mahout spark-itemsimilarity -i ${input} -o ${output} --master $MyMaster --sparkExecutorMem 2g
> 15/05/22 12:22:27 WARN TaskSetManager: Lost task 8.0 in stage 0.0 (TID 8, 182.118.21.30): java.io.IOException: org.apache.spark.SparkException: Failed to register classes with Kryo
>       at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1008)
>       at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
>       at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
>       at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
>       at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
>       at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
>       at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:61)
>       at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>       at org.apache.spark.scheduler.Task.run(Task.scala:56)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:195)
>       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>       at java.lang.Thread.run(Thread.java:722)
> Caused by: org.apache.spark.SparkException: Failed to register classes with Kryo
>       at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:105)
>       at org.apache.spark.serializer.KryoSerializerInstance.<init>(KryoSerializer.scala:157)
>       at org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:119)
>       at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:214)
>       at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
>       at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1005)
>       ... 12 more
> Caused by: java.lang.ClassCastException: org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator cannot be cast to org.apache.spark.serializer.KryoRegistrator
>       at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$3.apply(KryoSerializer.scala:101)
>       at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$3.apply(KryoSerializer.scala:101)
>       at scala.Option.map(Option.scala:145)
>       at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:101)
>       ... 17 more
> 
> 


Re: mahout spark FilteredRDD problem

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Mahout 0.10.0 runs on Spark 1.1.1 or below _only_

If you are only using spark-itemsimilarity you can try the unreleased master, which is being moved to Spark 1.2.2, and is binary compatible with Spark 1.3. Get the latest master branch from https://github.com/apache/mahout and build from source. Leave the version at Spark 1.2.2. Changing to Spark 1.3 in the pom will cause compile errors like the ones below.

I’ve run a few tests on Spark 1.3 but if you’d like to try also please report back what you find.

BTW the use of guava has been removed so any source examples are still being updated. If you are using the command line you should be fine,


On May 21, 2015, at 11:50 PM, lastarsenal <la...@163.com> wrote:

Hi, 


  Recently I tried mahout spark, for example: 


./bin/mahout spark-itemsimilarity -i ${input} -o ${output} --master $MyMaster --sparkExecutorMem 2g


then I met a error like "Caused by: java.lang.ClassCastException: org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator cannot be cast to org.apache.spark.serializer.KryoRegistrator", It seems that our spark version is NOT compatible with mahout. 


For the spark system is deployed by ops, so what I can do just follow them. Then I do something as below:
1.  Modify  <spark.version>1.1.1</spark.version>  to  <spark.version>1.3.0</spark.version> , which is our spark version,   in pom.xml in mahout.
2. Run mvn -DskipTests clean install
3. Get a error when build: 


[ERROR] spark/src/main/scala/org/apache/mahout/sparkbindings/drm/CheckpointedDrmSpark.scala:168: error: value saveAsSequenceFile is not a member of org.apache.mahout.sparkbindings.DrmRdd[K]
[ERROR]     rdd.saveAsSequenceFile(path)
[ERROR]         ^
[ERROR]spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala:26: error: object FilteredRDD is not a member of package org.apache.spark.rdd
[ERROR] import org.apache.spark.rdd.{FilteredRDD, RDD}


4. Check 1.3.0 spark, FilteredRDD is dismissed
5. Check 1.1.1 spark, FilteredRDD is available.


So, my question is how can I solve it?


The error details is as below when I run ./bin/mahout spark-itemsimilarity -i ${input} -o ${output} --master $MyMaster --sparkExecutorMem 2g
15/05/22 12:22:27 WARN TaskSetManager: Lost task 8.0 in stage 0.0 (TID 8, 182.118.21.30): java.io.IOException: org.apache.spark.SparkException: Failed to register classes with Kryo
       at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1008)
       at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
       at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
       at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
       at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
       at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
       at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:61)
       at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
       at org.apache.spark.scheduler.Task.run(Task.scala:56)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:195)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
       at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.spark.SparkException: Failed to register classes with Kryo
       at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:105)
       at org.apache.spark.serializer.KryoSerializerInstance.<init>(KryoSerializer.scala:157)
       at org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:119)
       at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:214)
       at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:177)
       at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1005)
       ... 12 more
Caused by: java.lang.ClassCastException: org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator cannot be cast to org.apache.spark.serializer.KryoRegistrator
       at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$3.apply(KryoSerializer.scala:101)
       at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$3.apply(KryoSerializer.scala:101)
       at scala.Option.map(Option.scala:145)
       at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:101)
       ... 17 more