You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Jeff Isenhart <je...@yahoo.com.INVALID> on 2015/03/05 06:21:36 UTC

mahout spark-itemsimilarity from command line

I am having issue getting a simple itemsimilarity example to work. I know hadoop is up and functional (ran the example mapreduce program anyway)
But when I run either of these
./mahout spark-itemsimilarity -i "SomeDir/transactions.csv" -o "hdfs://localhost:9000/users/someuser/output" -fc 1 -ic 2
./mahout spark-itemsimilarity -i "SomeDir/transactions.csv" -o "SomeDir/output" -fc 1 -ic 2
and get
Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:166) at org.apache.mahout.common.HDFSPathSearch.<init>(HDFSPathSearch.scala:36) at org.apache.mahout.drivers.ItemSimilarityDriver$.readIndexedDatasets(ItemSimilarityDriver.scala:152) at org.apache.mahout.drivers.ItemSimilarityDriver$.process(ItemSimilarityDriver.scala:213) at org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:116) at org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:114) at scala.Option.map(Option.scala:145) at org.apache.mahout.drivers.ItemSimilarityDriver$.main(ItemSimilarityDriver.scala:114) at org.apache.mahout.drivers.ItemSimilarityDriver.main(ItemSimilarityDriver.scala)
I am guessing there are some config settings I am missing
Usingmahout 1.0 Snapshothadoop 2.6.0

Re: mahout spark-itemsimilarity from command line

Posted by Jeff Isenhart <je...@yahoo.com.INVALID>.
OK, so the solution to the issue was to add the following to my core-site.xml
<!-- Added to try and solve mahout issue claiming 'No FileSystem for schema: hdfs' --><property>    <name>fs.file.impl</name>    <value>org.apache.hadoop.fs.LocalFileSystem</value>    <description>The FileSystem for file: uris.</description> </property>
 <property>    <name>fs.hdfs.impl</name>    <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>    <description>The FileSystem for hdfs: uris.</description> </property> 

     On Monday, March 9, 2015 11:38 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
   

 Mahout is on Spark 1.1.0 (before last week) and 1.1.1 as of current master. Running locally should use these but make sure these are installed if you run with anything other than —master local

The next thing to try is see which versions of Hadoop both Mahout and Spark are compiled for, they must be the one you have installed. Check build instructions for Spark https://spark.apache.org/docs/latest/building-spark.html this is for 1.2.1 but make sure you have source for 1.1.0 or 1.1.1
and Mahout http://mahout.apache.org/developers/buildingmahout.html

On Mar 9, 2015, at 11:20 AM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

Here is what I get with hadoop fs -ls
-rw-r--r--  1 username supergroup    5510526 2015-03-09 11:10 transactions.csv
Yes, I am trying to run a local version of Spark (trying to run everything local at the moment)
and when I run 
./bin/mahout spark-itemsimilarity -i transactions.csv -o output -fc 1 -ic 2
15/03/09 11:18:30 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@10.0.1.20:50565/user/HeartbeatReceiverException in thread "main" java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:166) at org.apache.mahout.common.HDFSPathSearch.<init>(HDFSPathSearch.scala:36) at org.apache.mahout.drivers.ItemSimilarityDriver$.readIndexedDatasets(ItemSimilarityDriver.scala:152) at org.apache.mahout.drivers.ItemSimilarityDriver$.process(ItemSimilarityDriver.scala:213) at org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:116) at org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:114) at scala.Option.map(Option.scala:145) at org.apache.mahout.drivers.ItemSimilarityDriver$.main(ItemSimilarityDriver.scala:114) at org.apache.mahout.drivers.ItemSimilarityDriver.main(ItemSimilarityDriver.scala) 

    On Monday, March 9, 2015 10:51 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:


>From the command line can you run:

    hadoop fs -ls

And see SomeDir/transactions.csv? It looks like HDFS is not accessible from wherever you are running spark-itemsimilarity.

Are you trying to run a local version of Spark because the default is "--master local” This can still access a clustered HDFS if you are configured to access it from your machine.


On Mar 9, 2015, at 10:35 AM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

bump...anybody??? 

    On Wednesday, March 4, 2015 9:22 PM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:


I am having issue getting a simple itemsimilarity example to work. I know hadoop is up and functional (ran the example mapreduce program anyway)
But when I run either of these
./mahout spark-itemsimilarity -i "SomeDir/transactions.csv" -o "hdfs://localhost:9000/users/someuser/output" -fc 1 -ic 2
./mahout spark-itemsimilarity -i "SomeDir/transactions.csv" -o "SomeDir/output" -fc 1 -ic 2
and get
Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:166) at org.apache.mahout.common.HDFSPathSearch.<init>(HDFSPathSearch.scala:36) at org.apache.mahout.drivers.ItemSimilarityDriver$.readIndexedDatasets(ItemSimilarityDriver.scala:152) at org.apache.mahout.drivers.ItemSimilarityDriver$.process(ItemSimilarityDriver.scala:213) at org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:116) at org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:114) at scala.Option.map(Option.scala:145) at org.apache.mahout.drivers.ItemSimilarityDriver$.main(ItemSimilarityDriver.scala:114) at org.apache.mahout.drivers.ItemSimilarityDriver.main(ItemSimilarityDriver.scala)
I am guessing there are some config settings I am missing
Usingmahout 1.0 Snapshothadoop 2.6.0





   

Re: mahout spark-itemsimilarity from command line

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Mahout is on Spark 1.1.0 (before last week) and 1.1.1 as of current master. Running locally should use these but make sure these are installed if you run with anything other than —master local

The next thing to try is see which versions of Hadoop both Mahout and Spark are compiled for, they must be the one you have installed. Check build instructions for Spark https://spark.apache.org/docs/latest/building-spark.html this is for 1.2.1 but make sure you have source for 1.1.0 or 1.1.1
and Mahout http://mahout.apache.org/developers/buildingmahout.html

On Mar 9, 2015, at 11:20 AM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

Here is what I get with hadoop fs -ls
-rw-r--r--   1 username supergroup    5510526 2015-03-09 11:10 transactions.csv
Yes, I am trying to run a local version of Spark (trying to run everything local at the moment)
and when I run 
./bin/mahout spark-itemsimilarity -i transactions.csv -o output -fc 1 -ic 2
15/03/09 11:18:30 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@10.0.1.20:50565/user/HeartbeatReceiverException in thread "main" java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:166) at org.apache.mahout.common.HDFSPathSearch.<init>(HDFSPathSearch.scala:36) at org.apache.mahout.drivers.ItemSimilarityDriver$.readIndexedDatasets(ItemSimilarityDriver.scala:152) at org.apache.mahout.drivers.ItemSimilarityDriver$.process(ItemSimilarityDriver.scala:213) at org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:116) at org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:114) at scala.Option.map(Option.scala:145) at org.apache.mahout.drivers.ItemSimilarityDriver$.main(ItemSimilarityDriver.scala:114) at org.apache.mahout.drivers.ItemSimilarityDriver.main(ItemSimilarityDriver.scala) 

    On Monday, March 9, 2015 10:51 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:


From the command line can you run:

    hadoop fs -ls

And see SomeDir/transactions.csv? It looks like HDFS is not accessible from wherever you are running spark-itemsimilarity.

Are you trying to run a local version of Spark because the default is "--master local” This can still access a clustered HDFS if you are configured to access it from your machine.


On Mar 9, 2015, at 10:35 AM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

bump...anybody??? 

    On Wednesday, March 4, 2015 9:22 PM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:


I am having issue getting a simple itemsimilarity example to work. I know hadoop is up and functional (ran the example mapreduce program anyway)
But when I run either of these
./mahout spark-itemsimilarity -i "SomeDir/transactions.csv" -o "hdfs://localhost:9000/users/someuser/output" -fc 1 -ic 2
./mahout spark-itemsimilarity -i "SomeDir/transactions.csv" -o "SomeDir/output" -fc 1 -ic 2
and get
Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:166) at org.apache.mahout.common.HDFSPathSearch.<init>(HDFSPathSearch.scala:36) at org.apache.mahout.drivers.ItemSimilarityDriver$.readIndexedDatasets(ItemSimilarityDriver.scala:152) at org.apache.mahout.drivers.ItemSimilarityDriver$.process(ItemSimilarityDriver.scala:213) at org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:116) at org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:114) at scala.Option.map(Option.scala:145) at org.apache.mahout.drivers.ItemSimilarityDriver$.main(ItemSimilarityDriver.scala:114) at org.apache.mahout.drivers.ItemSimilarityDriver.main(ItemSimilarityDriver.scala)
I am guessing there are some config settings I am missing
Usingmahout 1.0 Snapshothadoop 2.6.0





Re: mahout spark-itemsimilarity from command line

Posted by Jeff Isenhart <je...@yahoo.com.INVALID>.
Here is what I get with hadoop fs -ls
-rw-r--r--   1 username supergroup    5510526 2015-03-09 11:10 transactions.csv
Yes, I am trying to run a local version of Spark (trying to run everything local at the moment)
and when I run 
./bin/mahout spark-itemsimilarity -i transactions.csv -o output -fc 1 -ic 2
15/03/09 11:18:30 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@10.0.1.20:50565/user/HeartbeatReceiverException in thread "main" java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:166) at org.apache.mahout.common.HDFSPathSearch.<init>(HDFSPathSearch.scala:36) at org.apache.mahout.drivers.ItemSimilarityDriver$.readIndexedDatasets(ItemSimilarityDriver.scala:152) at org.apache.mahout.drivers.ItemSimilarityDriver$.process(ItemSimilarityDriver.scala:213) at org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:116) at org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:114) at scala.Option.map(Option.scala:145) at org.apache.mahout.drivers.ItemSimilarityDriver$.main(ItemSimilarityDriver.scala:114) at org.apache.mahout.drivers.ItemSimilarityDriver.main(ItemSimilarityDriver.scala) 

     On Monday, March 9, 2015 10:51 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
   

 From the command line can you run:

    hadoop fs -ls

And see SomeDir/transactions.csv? It looks like HDFS is not accessible from wherever you are running spark-itemsimilarity.

Are you trying to run a local version of Spark because the default is "--master local” This can still access a clustered HDFS if you are configured to access it from your machine.


On Mar 9, 2015, at 10:35 AM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

bump...anybody??? 

    On Wednesday, March 4, 2015 9:22 PM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:


I am having issue getting a simple itemsimilarity example to work. I know hadoop is up and functional (ran the example mapreduce program anyway)
But when I run either of these
./mahout spark-itemsimilarity -i "SomeDir/transactions.csv" -o "hdfs://localhost:9000/users/someuser/output" -fc 1 -ic 2
./mahout spark-itemsimilarity -i "SomeDir/transactions.csv" -o "SomeDir/output" -fc 1 -ic 2
and get
Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:166) at org.apache.mahout.common.HDFSPathSearch.<init>(HDFSPathSearch.scala:36) at org.apache.mahout.drivers.ItemSimilarityDriver$.readIndexedDatasets(ItemSimilarityDriver.scala:152) at org.apache.mahout.drivers.ItemSimilarityDriver$.process(ItemSimilarityDriver.scala:213) at org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:116) at org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:114) at scala.Option.map(Option.scala:145) at org.apache.mahout.drivers.ItemSimilarityDriver$.main(ItemSimilarityDriver.scala:114) at org.apache.mahout.drivers.ItemSimilarityDriver.main(ItemSimilarityDriver.scala)
I am guessing there are some config settings I am missing
Usingmahout 1.0 Snapshothadoop 2.6.0



   

Re: mahout spark-itemsimilarity from command line

Posted by Pat Ferrel <pa...@occamsmachete.com>.
From the command line can you run:

    hadoop fs -ls

And see SomeDir/transactions.csv? It looks like HDFS is not accessible from wherever you are running spark-itemsimilarity.

Are you trying to run a local version of Spark because the default is "--master local” This can still access a clustered HDFS if you are configured to access it from your machine.


On Mar 9, 2015, at 10:35 AM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:

bump...anybody??? 

    On Wednesday, March 4, 2015 9:22 PM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:


I am having issue getting a simple itemsimilarity example to work. I know hadoop is up and functional (ran the example mapreduce program anyway)
But when I run either of these
./mahout spark-itemsimilarity -i "SomeDir/transactions.csv" -o "hdfs://localhost:9000/users/someuser/output" -fc 1 -ic 2
./mahout spark-itemsimilarity -i "SomeDir/transactions.csv" -o "SomeDir/output" -fc 1 -ic 2
and get
Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:166) at org.apache.mahout.common.HDFSPathSearch.<init>(HDFSPathSearch.scala:36) at org.apache.mahout.drivers.ItemSimilarityDriver$.readIndexedDatasets(ItemSimilarityDriver.scala:152) at org.apache.mahout.drivers.ItemSimilarityDriver$.process(ItemSimilarityDriver.scala:213) at org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:116) at org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:114) at scala.Option.map(Option.scala:145) at org.apache.mahout.drivers.ItemSimilarityDriver$.main(ItemSimilarityDriver.scala:114) at org.apache.mahout.drivers.ItemSimilarityDriver.main(ItemSimilarityDriver.scala)
I am guessing there are some config settings I am missing
Usingmahout 1.0 Snapshothadoop 2.6.0



Re: mahout spark-itemsimilarity from command line

Posted by Jeff Isenhart <je...@yahoo.com.INVALID>.
bump...anybody??? 

     On Wednesday, March 4, 2015 9:22 PM, Jeff Isenhart <je...@yahoo.com.INVALID> wrote:
   

 I am having issue getting a simple itemsimilarity example to work. I know hadoop is up and functional (ran the example mapreduce program anyway)
But when I run either of these
./mahout spark-itemsimilarity -i "SomeDir/transactions.csv" -o "hdfs://localhost:9000/users/someuser/output" -fc 1 -ic 2
./mahout spark-itemsimilarity -i "SomeDir/transactions.csv" -o "SomeDir/output" -fc 1 -ic 2
and get
Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:166) at org.apache.mahout.common.HDFSPathSearch.<init>(HDFSPathSearch.scala:36) at org.apache.mahout.drivers.ItemSimilarityDriver$.readIndexedDatasets(ItemSimilarityDriver.scala:152) at org.apache.mahout.drivers.ItemSimilarityDriver$.process(ItemSimilarityDriver.scala:213) at org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:116) at org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:114) at scala.Option.map(Option.scala:145) at org.apache.mahout.drivers.ItemSimilarityDriver$.main(ItemSimilarityDriver.scala:114) at org.apache.mahout.drivers.ItemSimilarityDriver.main(ItemSimilarityDriver.scala)
I am guessing there are some config settings I am missing
Usingmahout 1.0 Snapshothadoop 2.6.0