You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Frank Scholten <fr...@frankscholten.nl> on 2014/09/12 15:46:22 UTC

Using ItemSimilarity.scala from Java

Hi all,

Trying out the new spark-itemsimilarity code, but I am new to Scala and
have hard time calling certain methods from Java.

Here is a Gist with a Java main that runs the cooccurrence analysis:

https://gist.github.com/frankscholten/d373c575ad721dd0204e

When I run this I get an exception:

Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.mahout.drivers.TextDelimitedIndexedDatasetReader.readElementsFrom(Ljava/lang/String;Lcom/google/common/collect/BiMap;)Lorg/apache/mahout/drivers/IndexedDataset;

What do I have to do here to use the Scala readers from Java?

Cheers,

Frank

Re: Using ItemSimilarity.scala from Java

Posted by Pat Ferrel <pa...@occamsmachete.com>.

I think if you use the mahout launcher script with a -spark and your main function it should add the spark module’s jar to the classpath. This may even solve your problem. See the mahout script for classpath manipulation.

On Sep 12, 2014, at 8:53 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:

#1 I’m glad to see someone using this. I haven’t tried calling Scala from Java and would expect a fair amount of difficulty with it. Scala constructs objects to deal with its new features (anonymous functions, traits, implicits) and you have to guess at what those will look like to java. Maybe you could try the Scala community.

Intellij will auto convert java to scala when you paste it into a .scala file. For some reason yours doesn’t seem to work but I’ve seen it work pretty well.

I started to convert your code and it pointed out a bug in mine, a bad value in the default schema. I’d be interested in helping with this as a way to work out the kinks in creating drivers.

Are you interested in this or are you set on using java? Either way I’ll post a gist of your code using the MahoutDriver as the template and converted to Scala. It’ll take me a few minutes.

On Sep 12, 2014, at 6:46 AM, Frank Scholten <fr...@frankscholten.nl> wrote:

Hi all,

Trying out the new spark-itemsimilarity code, but I am new to Scala and
have hard time calling certain methods from Java.

Here is a Gist with a Java main that runs the cooccurrence analysis:

https://gist.github.com/frankscholten/d373c575ad721dd0204e

When I run this I get an exception:

Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.mahout.drivers.TextDelimitedIndexedDatasetReader.readElementsFrom(Ljava/lang/String;Lcom/google/common/collect/BiMap;)Lorg/apache/mahout/drivers/IndexedDataset;

What do I have to do here to use the Scala readers from Java?

Cheers,

Frank

Re: Using ItemSimilarity.scala from Java

Posted by Frank Scholten <fr...@frankscholten.nl>.

When I replaced the TextDelimitedIndexedDatasetReader declaration with
TDIndexedDatasetReader I don't get the NoSuchMethodError anymore and the
process continues.

But now I end up with another error. Any idea what is going on here? Is it
because I am trying to print the contents of the Matrix at the end and I
should process them differently?

2014-09-26 13:36:30,605 DEBUG Logging$class - Task 5's epoch is 3
2014-09-26 13:36:30,605 DEBUG Logging$class - Fetching outputs for shuffle
2, reduce 0
2014-09-26 13:36:30,605 DEBUG Logging$class - Fetching map output location
for shuffle 2, reduce 0 took 0 ms
2014-09-26 13:36:30,605 INFO  Logging$class - maxBytesInFlight: 50331648,
targetRequestSize: 10066329
2014-09-26 13:36:30,605 INFO  Logging$class - Getting 1 non-empty blocks
out of 1 blocks
2014-09-26 13:36:30,605 INFO  Logging$class - Started 0 remote fetches in 0
ms
2014-09-26 13:36:30,606 DEBUG Logging$class - Got local block shuffle_2_0_0
2014-09-26 13:36:30,606 DEBUG Logging$class - Got local blocks in  1 ms ms
2014-09-26 13:36:30,662 ERROR Logging$class - Exception in task ID 5
java.io.NotSerializableException: org.apache.mahout.math.DenseVector
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
    at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
    at
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
    at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
    at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
    at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42)
    at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:71)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:193)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
2014-09-26 13:36:30,667 DEBUG Logging$class - parentName: , name:
TaskSet_4, runningTasks: 0
2014-09-26 13:36:30,669 WARN  Logging$class - Lost TID 5 (task 4.0:0)
2014-09-26 13:36:30,671 ERROR Logging$class - Task 4.0:0 had a not
serializable result: java.io.NotSerializableException:
org.apache.mahout.math.DenseVector; not retrying
2014-09-26 13:36:30,672 INFO  Logging$class - Removed TaskSet 4.0, whose
tasks have all completed, from pool
2014-09-26 13:36:30,676 INFO  Logging$class - Cancelling stage 4
2014-09-26 13:36:30,680 DEBUG Logging$class - After removal of stage 5,
remaining stages = 1
2014-09-26 13:36:30,680 INFO  Logging$class - Failed to run reduce at
SparkEngine.scala:72
Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 4.0:0 had a not serializable result:
java.io.NotSerializableException: org.apache.mahout.math.DenseVector
    at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
    at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
    at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
    at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
    at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
    at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
    at scala.Option.foreach(Option.scala:236)
    at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
    at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
    at akka.actor.ActorCell.invoke(ActorCell.scala:456)
2014-09-26 13:36:30,681 DEBUG Logging$class - Removing running stage 4
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
    at akka.dispatch.Mailbox.run(Mailbox.scala:219)
    at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2014-09-26 13:36:30,682 DEBUG Logging$class - Removing pending status for
stage 4
2014-09-26 13:36:30,682 DEBUG Logging$class - After removal of stage 4,
remaining stages = 0
2014-09-26 13:36:30,683 DEBUG FileSystem - Starting clear of FileSystem
cache with 1 elements.
2014-09-26 13:36:30,684 DEBUG FileSystem - Removing filesystem for file:///
2014-09-26 13:36:30,684 DEBUG FileSystem - Removing filesystem for file:///
2014-09-26 13:36:30,684 DEBUG FileSystem - Done clearing cache
2014-09-26 13:36:30,685 DEBUG Logging$class - Shutdown hook called
Disconnected from the target VM, address: '127.0.0.1:53897', transport:
'socket'

On Fri, Sep 12, 2014 at 7:05 PM, Pat Ferrel <pa...@gmail.com> wrote:

> True but a bit daunting to get started.
>
> Here is a translation to Scala.
> https://gist.github.com/pferrel/9cfee8b5723bb2e2a22c
>
> It uses the MahoutDriver and IndexedDataset and is compiled from
> org.apache.mahout.examples, which I created and so you’ll need to add the
> right imports if you do it somewhere else. For a bonus it uses Spark's
> parallel writing to part files and you can add command line parsing quite
> easily.
>
> article_views.txt:
> pat,article1
> pat,article2
> pat,article3
> frank,article3
> frank,article4
> joe-bob,article10
> joe-bob,article11
>
> indicators/part-00000
> article2        article1:3.819085009768877 article3:1.046496287529096
> article3        article2:1.046496287529096 article4:1.046496287529096
> article1:1.046496287529096
> article11       article10:3.819085009768877
> article4        article3:1.046496287529096
> article10       article11:3.819085009768877
> article1        article2:3.819085009768877 article3:1.046496287529096
>
> The search using frank’s history will return article2, article3(filter
> out), article4(filter out), and article 1 as you’d expect.
>
> Oh, and I was wrong about the bug—works from the current repo.
>
> You still need to get the right jars in the classpath when running from
> the command line
>
> On Sep 12, 2014, at 9:04 AM, Peter Wolf <op...@gmail.com> wrote:
>
> I'm new here, but I just wanted to add that Scala is extremely cool.  I've
> moved to Scala wherever possible in my work.  It's really nice, and well
> worth effort to learn.  Scala has put the joy back into programming.
>
> Instead of trying to call Scala from Java, perhaps you might enjoy writing
> your stuff in Scala.
>
> On Fri, Sep 12, 2014 at 11:53 AM, Pat Ferrel <pa...@occamsmachete.com>
> wrote:
>
> > #1 I’m glad to see someone using this. I haven’t tried calling Scala from
> > Java and would expect a fair amount of difficulty with it. Scala
> constructs
> > objects to deal with its new features (anonymous functions, traits,
> > implicits) and you have to guess at what those will look like to java.
> > Maybe you could try the Scala community.
> >
> > Intellij will auto convert java to scala when you paste it into a .scala
> > file. For some reason yours doesn’t seem to work but I’ve seen it work
> > pretty well.
> >
> > I started to convert your code and it pointed out a bug in mine, a bad
> > value in the default schema. I’d be interested in helping with this as a
> > way to work out the kinks in creating drivers.
> >
> > Are you interested in this or are you set on using java? Either way I’ll
> > post a gist of your code using the MahoutDriver as the template and
> > converted to Scala. It’ll take me a few minutes.
> >
> > On Sep 12, 2014, at 6:46 AM, Frank Scholten <fr...@frankscholten.nl>
> > wrote:
> >
> > Hi all,
> >
> > Trying out the new spark-itemsimilarity code, but I am new to Scala and
> > have hard time calling certain methods from Java.
> >
> > Here is a Gist with a Java main that runs the cooccurrence analysis:
> >
> > https://gist.github.com/frankscholten/d373c575ad721dd0204e
> >
> > When I run this I get an exception:
> >
> > Exception in thread "main" java.lang.NoSuchMethodError:
> >
> >
> org.apache.mahout.drivers.TextDelimitedIndexedDatasetReader.readElementsFrom(Ljava/lang/String;Lcom/google/common/collect/BiMap;)Lorg/apache/mahout/drivers/IndexedDataset;
> >
> > What do I have to do here to use the Scala readers from Java?
> >
> > Cheers,
> >
> > Frank
> >
> >
>
>

Re: Using ItemSimilarity.scala from Java

Posted by Pat Ferrel <pa...@gmail.com>.

True but a bit daunting to get started. 

Here is a translation to Scala. https://gist.github.com/pferrel/9cfee8b5723bb2e2a22c

It uses the MahoutDriver and IndexedDataset and is compiled from org.apache.mahout.examples, which I created and so you’ll need to add the right imports if you do it somewhere else. For a bonus it uses Spark's parallel writing to part files and you can add command line parsing quite easily.

article_views.txt:
pat,article1
pat,article2
pat,article3
frank,article3
frank,article4
joe-bob,article10
joe-bob,article11

indicators/part-00000
article2	article1:3.819085009768877 article3:1.046496287529096
article3	article2:1.046496287529096 article4:1.046496287529096 article1:1.046496287529096
article11	article10:3.819085009768877
article4	article3:1.046496287529096
article10	article11:3.819085009768877
article1	article2:3.819085009768877 article3:1.046496287529096

The search using frank’s history will return article2, article3(filter out), article4(filter out), and article 1 as you’d expect. 

Oh, and I was wrong about the bug—works from the current repo.

You still need to get the right jars in the classpath when running from the command line

On Sep 12, 2014, at 9:04 AM, Peter Wolf <op...@gmail.com> wrote:

I'm new here, but I just wanted to add that Scala is extremely cool.  I've
moved to Scala wherever possible in my work.  It's really nice, and well
worth effort to learn.  Scala has put the joy back into programming.

Instead of trying to call Scala from Java, perhaps you might enjoy writing
your stuff in Scala.

On Fri, Sep 12, 2014 at 11:53 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> #1 I’m glad to see someone using this. I haven’t tried calling Scala from
> Java and would expect a fair amount of difficulty with it. Scala constructs
> objects to deal with its new features (anonymous functions, traits,
> implicits) and you have to guess at what those will look like to java.
> Maybe you could try the Scala community.
> 
> Intellij will auto convert java to scala when you paste it into a .scala
> file. For some reason yours doesn’t seem to work but I’ve seen it work
> pretty well.
> 
> I started to convert your code and it pointed out a bug in mine, a bad
> value in the default schema. I’d be interested in helping with this as a
> way to work out the kinks in creating drivers.
> 
> Are you interested in this or are you set on using java? Either way I’ll
> post a gist of your code using the MahoutDriver as the template and
> converted to Scala. It’ll take me a few minutes.
> 
> On Sep 12, 2014, at 6:46 AM, Frank Scholten <fr...@frankscholten.nl>
> wrote:
> 
> Hi all,
> 
> Trying out the new spark-itemsimilarity code, but I am new to Scala and
> have hard time calling certain methods from Java.
> 
> Here is a Gist with a Java main that runs the cooccurrence analysis:
> 
> https://gist.github.com/frankscholten/d373c575ad721dd0204e
> 
> When I run this I get an exception:
> 
> Exception in thread "main" java.lang.NoSuchMethodError:
> 
> org.apache.mahout.drivers.TextDelimitedIndexedDatasetReader.readElementsFrom(Ljava/lang/String;Lcom/google/common/collect/BiMap;)Lorg/apache/mahout/drivers/IndexedDataset;
> 
> What do I have to do here to use the Scala readers from Java?
> 
> Cheers,
> 
> Frank
> 
>

Re: Using ItemSimilarity.scala from Java

Posted by Peter Wolf <op...@gmail.com>.

I'm new here, but I just wanted to add that Scala is extremely cool.  I've
moved to Scala wherever possible in my work.  It's really nice, and well
worth effort to learn.  Scala has put the joy back into programming.

Instead of trying to call Scala from Java, perhaps you might enjoy writing
your stuff in Scala.

On Fri, Sep 12, 2014 at 11:53 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> #1 I’m glad to see someone using this. I haven’t tried calling Scala from
> Java and would expect a fair amount of difficulty with it. Scala constructs
> objects to deal with its new features (anonymous functions, traits,
> implicits) and you have to guess at what those will look like to java.
> Maybe you could try the Scala community.
>
> Intellij will auto convert java to scala when you paste it into a .scala
> file. For some reason yours doesn’t seem to work but I’ve seen it work
> pretty well.
>
> I started to convert your code and it pointed out a bug in mine, a bad
> value in the default schema. I’d be interested in helping with this as a
> way to work out the kinks in creating drivers.
>
> Are you interested in this or are you set on using java? Either way I’ll
> post a gist of your code using the MahoutDriver as the template and
> converted to Scala. It’ll take me a few minutes.
>
> On Sep 12, 2014, at 6:46 AM, Frank Scholten <fr...@frankscholten.nl>
> wrote:
>
> Hi all,
>
> Trying out the new spark-itemsimilarity code, but I am new to Scala and
> have hard time calling certain methods from Java.
>
> Here is a Gist with a Java main that runs the cooccurrence analysis:
>
> https://gist.github.com/frankscholten/d373c575ad721dd0204e
>
> When I run this I get an exception:
>
> Exception in thread "main" java.lang.NoSuchMethodError:
>
> org.apache.mahout.drivers.TextDelimitedIndexedDatasetReader.readElementsFrom(Ljava/lang/String;Lcom/google/common/collect/BiMap;)Lorg/apache/mahout/drivers/IndexedDataset;
>
> What do I have to do here to use the Scala readers from Java?
>
> Cheers,
>
> Frank
>
>

Re: Using ItemSimilarity.scala from Java

Posted by Pat Ferrel <pa...@occamsmachete.com>.

Intellij will auto convert java to scala when you paste it into a .scala file. For some reason yours doesn’t seem to work but I’ve seen it work pretty well.

I started to convert your code and it pointed out a bug in mine, a bad value in the default schema. I’d be interested in helping with this as a way to work out the kinks in creating drivers.

Are you interested in this or are you set on using java? Either way I’ll post a gist of your code using the MahoutDriver as the template and converted to Scala. It’ll take me a few minutes.

On Sep 12, 2014, at 6:46 AM, Frank Scholten <fr...@frankscholten.nl> wrote:

Hi all,

Trying out the new spark-itemsimilarity code, but I am new to Scala and
have hard time calling certain methods from Java.

Here is a Gist with a Java main that runs the cooccurrence analysis:

https://gist.github.com/frankscholten/d373c575ad721dd0204e

When I run this I get an exception:

What do I have to do here to use the Scala readers from Java?

Cheers,

Frank