You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Jeff Isenhart <je...@yahoo.com.INVALID> on 2015/03/12 04:24:55 UTC

demo spark-itemsimilarity; empty output

I am trying to run the example found here:
http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html


The data (demoItems.csv added to hdfs) is just copied from the example:
u1,purchase,iphoneu1,purchase,ipadu2,purchase,nexus......
But when I run 
mahout spark-itemsimilarity -i demoItems.csv -o output2 -fc 1 -ic 2
I get empty _SUCCESS and part-00000 files 
output2/indicator-matrix

Any ideas?

Re: demo spark-itemsimilarity; empty output

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Hmm I just ran into this, thanks for the research. 

This may cause problems on cluster machines unless it is Mac specific so putting into /usr/lib/java may need to be on all nodes. Not sure that is the best solution. Let me know if you run into this on a ’nix type cluster.


On Mar 19, 2015, at 3:17 PM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

OK, fixed the snappy issue (that happens on Mac/jre 1.7) by downloading https://wso2.org/jira/secure/attachment/32013/libsnappyjava.jnilib and placing the file in /usr/lib/java/
Now, when I run ./bin/mahout spark-itemsimilarity -i demoItems.csv -o output4 -fc 1 -ic 2 --filter1 purchase --filter2 viewI get the desired output just like the example 


    On Wednesday, March 18, 2015 4:59 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:


Looks like you don’t have the native snappy code installed correctly. That’s a Hadoop thing I think, for fast compressed serialization methinks.


On Mar 18, 2015, at 4:08 PM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

Thanks for the input Pat. I ran the following command 
./bin/mahout spark-itemsimilarity -i demoItems.csv -o output4 -fc 1 -ic 2 --filter1 purchase --filter2 view
on data
u1,purchase,iphoneu1,purchase,ipadu2,purchase,nexus
and now seeing this error
java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:317) at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219) at org.xerial.snappy.Snappy.<clinit>(Snappy.java:44) at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79) at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125) at org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1029) at org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608) at org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608) at org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:126) at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192) at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67) at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at org.apache.spark.util.collection.AppendOnlyMap$$anon$1.foreach(AppendOnlyMap.scala:159) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)Caused by: java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886) at java.lang.Runtime.loadLibrary0(Runtime.java:849) at java.lang.System.loadLibrary(System.java:1088) at org.xerial.snappy.SnappyNativeLoader.loadLibrary(SnappyNativeLoader.java:52) ... 26 more 



    On Thursday, March 12, 2015 10:35 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:


There are many ways to structure the input. The spark-itemsimilarity driver can take only two actions, though the internal code, if you want to use it as a library, will take any number. The CLI driver can optionally take input of the for you mention but will extract a primary and single secondary action per execution. If you have more than two actions you can run the driver once for every secondary action or use the lib interface.

You can have your interactions in separate dirs of the form I mentioned in the original answer, in which case you pass in -i and -i2 params. If you want to mix actions in the same files, use the format you describe:

u1,item1,action1
u1,item10,action2
u1,item500,action3
u2,item2,action1
u2,item500,action3
...

The columns can be moved around and specified on the CLI. To use the above with the CLI you would have to process action1 and action2 with one execution, and action1 and action3 with another execution. This will create 4 outputs the two “similarity-matrix” dirs will be identical. This would give you indicators for action1 (actually two identical indicators) action2 and action3

On Mar 12, 2015, at 9:52 AM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

<pre>Hmmm, then what about the "How to Use Multiple Actions" section that states
For a mixed action log of the form:u1,purchase,iphone
u1,purchase,ipad
u2,purchase,nexus</pre>


    On Thursday, March 12, 2015 9:39 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:


spark-itemsimilarity takes tuples

user-id,item-id

You are looking at the collected input as a matrix. it would be collected from something of the form:
u1,item1
u1,item10
u1,item500
u2,item2
u2,item500
...


On Mar 11, 2015, at 8:24 PM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

I am trying to run the example found here:
http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html


The data (demoItems.csv added to hdfs) is just copied from the example:
u1,purchase,iphoneu1,purchase,ipadu2,purchase,nexus......
But when I run 
mahout spark-itemsimilarity -i demoItems.csv -o output2 -fc 1 -ic 2
I get empty _SUCCESS and part-00000 files 
output2/indicator-matrix

Any ideas?








Re: demo spark-itemsimilarity; empty output

Posted by Jeff Isenhart <je...@yahoo.com.INVALID>.
OK, fixed the snappy issue (that happens on Mac/jre 1.7) by downloading https://wso2.org/jira/secure/attachment/32013/libsnappyjava.jnilib and placing the file in /usr/lib/java/
Now, when I run ./bin/mahout spark-itemsimilarity -i demoItems.csv -o output4 -fc 1 -ic 2 --filter1 purchase --filter2 viewI get the desired output just like the example 


     On Wednesday, March 18, 2015 4:59 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
   

 Looks like you don’t have the native snappy code installed correctly. That’s a Hadoop thing I think, for fast compressed serialization methinks.


On Mar 18, 2015, at 4:08 PM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

Thanks for the input Pat. I ran the following command 
./bin/mahout spark-itemsimilarity -i demoItems.csv -o output4 -fc 1 -ic 2 --filter1 purchase --filter2 view
on data
u1,purchase,iphoneu1,purchase,ipadu2,purchase,nexus
and now seeing this error
java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:317) at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219) at org.xerial.snappy.Snappy.<clinit>(Snappy.java:44) at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79) at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125) at org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1029) at org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608) at org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608) at org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:126) at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192) at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67) at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at org.apache.spark.util.collection.AppendOnlyMap$$anon$1.foreach(AppendOnlyMap.scala:159) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)Caused by: java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886) at java.lang.Runtime.loadLibrary0(Runtime.java:849) at java.lang.System.loadLibrary(System.java:1088) at org.xerial.snappy.SnappyNativeLoader.loadLibrary(SnappyNativeLoader.java:52) ... 26 more 



    On Thursday, March 12, 2015 10:35 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:


There are many ways to structure the input. The spark-itemsimilarity driver can take only two actions, though the internal code, if you want to use it as a library, will take any number. The CLI driver can optionally take input of the for you mention but will extract a primary and single secondary action per execution. If you have more than two actions you can run the driver once for every secondary action or use the lib interface.

You can have your interactions in separate dirs of the form I mentioned in the original answer, in which case you pass in -i and -i2 params. If you want to mix actions in the same files, use the format you describe:

u1,item1,action1
u1,item10,action2
u1,item500,action3
u2,item2,action1
u2,item500,action3
...

The columns can be moved around and specified on the CLI. To use the above with the CLI you would have to process action1 and action2 with one execution, and action1 and action3 with another execution. This will create 4 outputs the two “similarity-matrix” dirs will be identical. This would give you indicators for action1 (actually two identical indicators) action2 and action3

On Mar 12, 2015, at 9:52 AM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

<pre>Hmmm, then what about the "How to Use Multiple Actions" section that states
For a mixed action log of the form:u1,purchase,iphone
u1,purchase,ipad
u2,purchase,nexus</pre>


    On Thursday, March 12, 2015 9:39 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:


spark-itemsimilarity takes tuples

user-id,item-id

You are looking at the collected input as a matrix. it would be collected from something of the form:
u1,item1
u1,item10
u1,item500
u2,item2
u2,item500
...


On Mar 11, 2015, at 8:24 PM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

I am trying to run the example found here:
http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html


The data (demoItems.csv added to hdfs) is just copied from the example:
u1,purchase,iphoneu1,purchase,ipadu2,purchase,nexus......
But when I run 
mahout spark-itemsimilarity -i demoItems.csv -o output2 -fc 1 -ic 2
I get empty _SUCCESS and part-00000 files 
output2/indicator-matrix

Any ideas?






  

Re: demo spark-itemsimilarity; empty output

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Looks like you don’t have the native snappy code installed correctly. That’s a Hadoop thing I think, for fast compressed serialization methinks.


On Mar 18, 2015, at 4:08 PM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

Thanks for the input Pat. I ran the following command 
./bin/mahout spark-itemsimilarity -i demoItems.csv -o output4 -fc 1 -ic 2 --filter1 purchase --filter2 view
on data
u1,purchase,iphoneu1,purchase,ipadu2,purchase,nexus
and now seeing this error
java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:317) at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219) at org.xerial.snappy.Snappy.<clinit>(Snappy.java:44) at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79) at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125) at org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1029) at org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608) at org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608) at org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:126) at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192) at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67) at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at org.apache.spark.util.collection.AppendOnlyMap$$anon$1.foreach(AppendOnlyMap.scala:159) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)Caused by: java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886) at java.lang.Runtime.loadLibrary0(Runtime.java:849) at java.lang.System.loadLibrary(System.java:1088) at org.xerial.snappy.SnappyNativeLoader.loadLibrary(SnappyNativeLoader.java:52) ... 26 more 



    On Thursday, March 12, 2015 10:35 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:


There are many ways to structure the input. The spark-itemsimilarity driver can take only two actions, though the internal code, if you want to use it as a library, will take any number. The CLI driver can optionally take input of the for you mention but will extract a primary and single secondary action per execution. If you have more than two actions you can run the driver once for every secondary action or use the lib interface.

You can have your interactions in separate dirs of the form I mentioned in the original answer, in which case you pass in -i and -i2 params. If you want to mix actions in the same files, use the format you describe:

u1,item1,action1
u1,item10,action2
u1,item500,action3
u2,item2,action1
u2,item500,action3
...

The columns can be moved around and specified on the CLI. To use the above with the CLI you would have to process action1 and action2 with one execution, and action1 and action3 with another execution. This will create 4 outputs the two “similarity-matrix” dirs will be identical. This would give you indicators for action1 (actually two identical indicators) action2 and action3

On Mar 12, 2015, at 9:52 AM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

<pre>Hmmm, then what about the "How to Use Multiple Actions" section that states
For a mixed action log of the form:u1,purchase,iphone
u1,purchase,ipad
u2,purchase,nexus</pre>


    On Thursday, March 12, 2015 9:39 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:


spark-itemsimilarity takes tuples

user-id,item-id

You are looking at the collected input as a matrix. it would be collected from something of the form:
u1,item1
u1,item10
u1,item500
u2,item2
u2,item500
...


On Mar 11, 2015, at 8:24 PM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

I am trying to run the example found here:
http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html


The data (demoItems.csv added to hdfs) is just copied from the example:
u1,purchase,iphoneu1,purchase,ipadu2,purchase,nexus......
But when I run 
mahout spark-itemsimilarity -i demoItems.csv -o output2 -fc 1 -ic 2
I get empty _SUCCESS and part-00000 files 
output2/indicator-matrix

Any ideas?






Re: demo spark-itemsimilarity; empty output

Posted by Jeff Isenhart <je...@yahoo.com.INVALID>.
Thanks for the input Pat. I ran the following command 
./bin/mahout spark-itemsimilarity -i demoItems.csv -o output4 -fc 1 -ic 2 --filter1 purchase --filter2 view
on data
u1,purchase,iphoneu1,purchase,ipadu2,purchase,nexus
and now seeing this error
java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:317) at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219) at org.xerial.snappy.Snappy.<clinit>(Snappy.java:44) at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79) at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125) at org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1029) at org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608) at org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608) at org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:126) at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192) at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67) at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at org.apache.spark.util.collection.AppendOnlyMap$$anon$1.foreach(AppendOnlyMap.scala:159) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)Caused by: java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886) at java.lang.Runtime.loadLibrary0(Runtime.java:849) at java.lang.System.loadLibrary(System.java:1088) at org.xerial.snappy.SnappyNativeLoader.loadLibrary(SnappyNativeLoader.java:52) ... 26 more 



     On Thursday, March 12, 2015 10:35 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
   

 There are many ways to structure the input. The spark-itemsimilarity driver can take only two actions, though the internal code, if you want to use it as a library, will take any number. The CLI driver can optionally take input of the for you mention but will extract a primary and single secondary action per execution. If you have more than two actions you can run the driver once for every secondary action or use the lib interface.

You can have your interactions in separate dirs of the form I mentioned in the original answer, in which case you pass in -i and -i2 params. If you want to mix actions in the same files, use the format you describe:

u1,item1,action1
u1,item10,action2
u1,item500,action3
u2,item2,action1
u2,item500,action3
...

The columns can be moved around and specified on the CLI. To use the above with the CLI you would have to process action1 and action2 with one execution, and action1 and action3 with another execution. This will create 4 outputs the two “similarity-matrix” dirs will be identical. This would give you indicators for action1 (actually two identical indicators) action2 and action3

On Mar 12, 2015, at 9:52 AM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

<pre>Hmmm, then what about the "How to Use Multiple Actions" section that states
 For a mixed action log of the form:u1,purchase,iphone
u1,purchase,ipad
u2,purchase,nexus</pre>


    On Thursday, March 12, 2015 9:39 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:


spark-itemsimilarity takes tuples

user-id,item-id

You are looking at the collected input as a matrix. it would be collected from something of the form:
u1,item1
u1,item10
u1,item500
u2,item2
u2,item500
...


On Mar 11, 2015, at 8:24 PM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

I am trying to run the example found here:
http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html


The data (demoItems.csv added to hdfs) is just copied from the example:
u1,purchase,iphoneu1,purchase,ipadu2,purchase,nexus......
But when I run 
mahout spark-itemsimilarity -i demoItems.csv -o output2 -fc 1 -ic 2
I get empty _SUCCESS and part-00000 files 
output2/indicator-matrix

Any ideas?




  

Re: demo spark-itemsimilarity; empty output

Posted by Pat Ferrel <pa...@occamsmachete.com>.
There are many ways to structure the input. The spark-itemsimilarity driver can take only two actions, though the internal code, if you want to use it as a library, will take any number. The CLI driver can optionally take input of the for you mention but will extract a primary and single secondary action per execution. If you have more than two actions you can run the driver once for every secondary action or use the lib interface.

You can have your interactions in separate dirs of the form I mentioned in the original answer, in which case you pass in -i and -i2 params. If you want to mix actions in the same files, use the format you describe:

u1,item1,action1
u1,item10,action2
u1,item500,action3
u2,item2,action1
u2,item500,action3
...

The columns can be moved around and specified on the CLI. To use the above with the CLI you would have to process action1 and action2 with one execution, and action1 and action3 with another execution. This will create 4 outputs the two “similarity-matrix” dirs will be identical. This would give you indicators for action1 (actually two identical indicators) action2 and action3

On Mar 12, 2015, at 9:52 AM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

<pre>Hmmm, then what about the "How to Use Multiple Actions" section that states
 For a mixed action log of the form:u1,purchase,iphone
u1,purchase,ipad
u2,purchase,nexus</pre>


    On Thursday, March 12, 2015 9:39 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:


spark-itemsimilarity takes tuples

user-id,item-id

You are looking at the collected input as a matrix. it would be collected from something of the form:
u1,item1
u1,item10
u1,item500
u2,item2
u2,item500
...


On Mar 11, 2015, at 8:24 PM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

I am trying to run the example found here:
http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html


The data (demoItems.csv added to hdfs) is just copied from the example:
u1,purchase,iphoneu1,purchase,ipadu2,purchase,nexus......
But when I run 
mahout spark-itemsimilarity -i demoItems.csv -o output2 -fc 1 -ic 2
I get empty _SUCCESS and part-00000 files 
output2/indicator-matrix

Any ideas?




Re: demo spark-itemsimilarity; empty output

Posted by Jeff Isenhart <je...@yahoo.com.INVALID>.
<pre>Hmmm, then what about the "How to Use Multiple Actions" section that states
 For a mixed action log of the form:u1,purchase,iphone
u1,purchase,ipad
u2,purchase,nexus</pre>
 

     On Thursday, March 12, 2015 9:39 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
   

 spark-itemsimilarity takes tuples

user-id,item-id

You are looking at the collected input as a matrix. it would be collected from something of the form:
u1,item1
u1,item10
u1,item500
u2,item2
u2,item500
...


On Mar 11, 2015, at 8:24 PM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

I am trying to run the example found here:
http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html


The data (demoItems.csv added to hdfs) is just copied from the example:
u1,purchase,iphoneu1,purchase,ipadu2,purchase,nexus......
But when I run 
mahout spark-itemsimilarity -i demoItems.csv -o output2 -fc 1 -ic 2
I get empty _SUCCESS and part-00000 files 
output2/indicator-matrix

Any ideas?


   

Re: demo spark-itemsimilarity; empty output

Posted by Pat Ferrel <pa...@occamsmachete.com>.
spark-itemsimilarity takes tuples

user-id,item-id

You are looking at the collected input as a matrix. it would be collected from something of the form:
u1,item1
u1,item10
u1,item500
u2,item2
u2,item500
...


On Mar 11, 2015, at 8:24 PM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

I am trying to run the example found here:
http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html


The data (demoItems.csv added to hdfs) is just copied from the example:
u1,purchase,iphoneu1,purchase,ipadu2,purchase,nexus......
But when I run 
mahout spark-itemsimilarity -i demoItems.csv -o output2 -fc 1 -ic 2
I get empty _SUCCESS and part-00000 files 
output2/indicator-matrix

Any ideas?