You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Ying Liao <yl...@gmail.com> on 2013/01/11 22:41:26 UTC

is Hadoop based SVD_ALS a complete feature?

I am tring factorize-movielens-1M.sh. I first find a bug in the sh file.
Then I find a bug in org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter,
the argMap is not mapped. No I hit a third bug:
[cloudera@localhost trunk]$ hadoop jar
/home/cloudera/workspace/Mahout/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter --input
/user/cloudera/ratings.csv --output /user/cloudera/dataset
--trainingPercentage 0.9 --probePercentage 0.1 --tempDir
/user/cloudera/dataset/tmp
13/01/11 16:37:30 INFO common.AbstractJob: Command line arguments:
{--endPhase=[2147483647], --input=[/user/cloudera/ratings.csv],
--output=[/user/cloudera/dataset], --probePercentage=[0.1],
--startPhase=[0], --tempDir=[/user/cloudera/dataset/tmp],
--trainingPercentage=[0.9]}
13/01/11 16:37:30 WARN conf.Configuration: mapred.input.dir is deprecated.
Instead, use mapreduce.input.fileinputformat.inputdir
13/01/11 16:37:30 WARN conf.Configuration: mapred.compress.map.output is
deprecated. Instead, use mapreduce.map.output.compress
13/01/11 16:37:30 WARN conf.Configuration: mapred.output.dir is deprecated.
Instead, use mapreduce.output.fileoutputformat.outputdir
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found
interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.apache.mahout.common.HadoopUtil.getCustomJobName(HadoopUtil.java:166)
at org.apache.mahout.common.AbstractJob.prepareJob(AbstractJob.java:553)
at
org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter.run(DatasetSplitter.java:85)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at
org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter.main(DatasetSplitter.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

Any help is appreciated.

Thanks,
Ying

Re: is Hadoop based SVD_ALS a complete feature?

Posted by Ying Liao <yl...@gmail.com>.
The last problem is due to hadoop version conflict - Thanks Sebastian. I
updated POM with the hadoop version I am using and re-compiled and it's
gone.

Now a new problem I have is, I am working on the very sparse dataset - 60M
records from 3M users and 12M items. Running on the 9-machines cluster,
it takes 13 mins per iteration for 20 features, but it takes tens of hours
per iteration for 60 features. Is someone with me on this?

Thanks,
Ying




On Thu, Jan 17, 2013 at 11:09 AM, Pat Ferrel <pa...@gmail.com> wrote:

> There is a problem in factorize-movielens-1M.sh and the DataSplitter needs
> to initialize the args parser before it accesses the options ( I think I
> put a ticket in for the DataSplitter with a patch). The last problem below
> is Ying Liao's alone.
>
> On Jan 17, 2013, at 7:12 AM, Sebastian Schelter <ss...@apache.org> wrote:
>
> Which version/distribution of Hadoop are you using?
>
> On 17.01.2013 16:08, Pat Ferrel wrote:
> > +1 this, found the same problems, same fixes. Haven't seem your last
> problem
> >
> > On Jan 11, 2013, at 1:41 PM, Ying Liao <yl...@gmail.com> wrote:
> >
> > I am tring factorize-movielens-1M.sh. I first find a bug in the sh file.
> > Then I find a bug in
> org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter,
> > the argMap is not mapped. No I hit a third bug:
> > [cloudera@localhost trunk]$ hadoop jar
> >
> /home/cloudera/workspace/Mahout/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
> > org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter --input
> > /user/cloudera/ratings.csv --output /user/cloudera/dataset
> > --trainingPercentage 0.9 --probePercentage 0.1 --tempDir
> > /user/cloudera/dataset/tmp
> > 13/01/11 16:37:30 INFO common.AbstractJob: Command line arguments:
> > {--endPhase=[2147483647], --input=[/user/cloudera/ratings.csv],
> > --output=[/user/cloudera/dataset], --probePercentage=[0.1],
> > --startPhase=[0], --tempDir=[/user/cloudera/dataset/tmp],
> > --trainingPercentage=[0.9]}
> > 13/01/11 16:37:30 WARN conf.Configuration: mapred.input.dir is
> deprecated.
> > Instead, use mapreduce.input.fileinputformat.inputdir
> > 13/01/11 16:37:30 WARN conf.Configuration: mapred.compress.map.output is
> > deprecated. Instead, use mapreduce.map.output.compress
> > 13/01/11 16:37:30 WARN conf.Configuration: mapred.output.dir is
> deprecated.
> > Instead, use mapreduce.output.fileoutputformat.outputdir
> > Exception in thread "main" java.lang.IncompatibleClassChangeError: Found
> > interface org.apache.hadoop.mapreduce.JobContext, but class was expected
> > at
> org.apache.mahout.common.HadoopUtil.getCustomJobName(HadoopUtil.java:166)
> > at org.apache.mahout.common.AbstractJob.prepareJob(AbstractJob.java:553)
> > at
> >
> org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter.run(DatasetSplitter.java:85)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> > at
> >
> org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter.main(DatasetSplitter.java:62)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> > at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> >
> > Any help is appreciated.
> >
> > Thanks,
> > Ying
> >
>
>
>

Re: is Hadoop based SVD_ALS a complete feature?

Posted by Pat Ferrel <pa...@gmail.com>.
There is a problem in factorize-movielens-1M.sh and the DataSplitter needs to initialize the args parser before it accesses the options ( I think I put a ticket in for the DataSplitter with a patch). The last problem below is Ying Liao's alone.

On Jan 17, 2013, at 7:12 AM, Sebastian Schelter <ss...@apache.org> wrote:

Which version/distribution of Hadoop are you using?

On 17.01.2013 16:08, Pat Ferrel wrote:
> +1 this, found the same problems, same fixes. Haven't seem your last problem
> 
> On Jan 11, 2013, at 1:41 PM, Ying Liao <yl...@gmail.com> wrote:
> 
> I am tring factorize-movielens-1M.sh. I first find a bug in the sh file.
> Then I find a bug in org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter,
> the argMap is not mapped. No I hit a third bug:
> [cloudera@localhost trunk]$ hadoop jar
> /home/cloudera/workspace/Mahout/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
> org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter --input
> /user/cloudera/ratings.csv --output /user/cloudera/dataset
> --trainingPercentage 0.9 --probePercentage 0.1 --tempDir
> /user/cloudera/dataset/tmp
> 13/01/11 16:37:30 INFO common.AbstractJob: Command line arguments:
> {--endPhase=[2147483647], --input=[/user/cloudera/ratings.csv],
> --output=[/user/cloudera/dataset], --probePercentage=[0.1],
> --startPhase=[0], --tempDir=[/user/cloudera/dataset/tmp],
> --trainingPercentage=[0.9]}
> 13/01/11 16:37:30 WARN conf.Configuration: mapred.input.dir is deprecated.
> Instead, use mapreduce.input.fileinputformat.inputdir
> 13/01/11 16:37:30 WARN conf.Configuration: mapred.compress.map.output is
> deprecated. Instead, use mapreduce.map.output.compress
> 13/01/11 16:37:30 WARN conf.Configuration: mapred.output.dir is deprecated.
> Instead, use mapreduce.output.fileoutputformat.outputdir
> Exception in thread "main" java.lang.IncompatibleClassChangeError: Found
> interface org.apache.hadoop.mapreduce.JobContext, but class was expected
> at org.apache.mahout.common.HadoopUtil.getCustomJobName(HadoopUtil.java:166)
> at org.apache.mahout.common.AbstractJob.prepareJob(AbstractJob.java:553)
> at
> org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter.run(DatasetSplitter.java:85)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at
> org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter.main(DatasetSplitter.java:62)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> 
> Any help is appreciated.
> 
> Thanks,
> Ying
> 



Re: is Hadoop based SVD_ALS a complete feature?

Posted by Sebastian Schelter <ss...@apache.org>.
Which version/distribution of Hadoop are you using?

On 17.01.2013 16:08, Pat Ferrel wrote:
> +1 this, found the same problems, same fixes. Haven't seem your last problem
> 
> On Jan 11, 2013, at 1:41 PM, Ying Liao <yl...@gmail.com> wrote:
> 
> I am tring factorize-movielens-1M.sh. I first find a bug in the sh file.
> Then I find a bug in org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter,
> the argMap is not mapped. No I hit a third bug:
> [cloudera@localhost trunk]$ hadoop jar
> /home/cloudera/workspace/Mahout/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
> org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter --input
> /user/cloudera/ratings.csv --output /user/cloudera/dataset
> --trainingPercentage 0.9 --probePercentage 0.1 --tempDir
> /user/cloudera/dataset/tmp
> 13/01/11 16:37:30 INFO common.AbstractJob: Command line arguments:
> {--endPhase=[2147483647], --input=[/user/cloudera/ratings.csv],
> --output=[/user/cloudera/dataset], --probePercentage=[0.1],
> --startPhase=[0], --tempDir=[/user/cloudera/dataset/tmp],
> --trainingPercentage=[0.9]}
> 13/01/11 16:37:30 WARN conf.Configuration: mapred.input.dir is deprecated.
> Instead, use mapreduce.input.fileinputformat.inputdir
> 13/01/11 16:37:30 WARN conf.Configuration: mapred.compress.map.output is
> deprecated. Instead, use mapreduce.map.output.compress
> 13/01/11 16:37:30 WARN conf.Configuration: mapred.output.dir is deprecated.
> Instead, use mapreduce.output.fileoutputformat.outputdir
> Exception in thread "main" java.lang.IncompatibleClassChangeError: Found
> interface org.apache.hadoop.mapreduce.JobContext, but class was expected
> at org.apache.mahout.common.HadoopUtil.getCustomJobName(HadoopUtil.java:166)
> at org.apache.mahout.common.AbstractJob.prepareJob(AbstractJob.java:553)
> at
> org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter.run(DatasetSplitter.java:85)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at
> org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter.main(DatasetSplitter.java:62)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> 
> Any help is appreciated.
> 
> Thanks,
> Ying
> 


Re: is Hadoop based SVD_ALS a complete feature?

Posted by Pat Ferrel <pa...@gmail.com>.
+1 this, found the same problems, same fixes. Haven't seem your last problem

On Jan 11, 2013, at 1:41 PM, Ying Liao <yl...@gmail.com> wrote:

I am tring factorize-movielens-1M.sh. I first find a bug in the sh file.
Then I find a bug in org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter,
the argMap is not mapped. No I hit a third bug:
[cloudera@localhost trunk]$ hadoop jar
/home/cloudera/workspace/Mahout/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter --input
/user/cloudera/ratings.csv --output /user/cloudera/dataset
--trainingPercentage 0.9 --probePercentage 0.1 --tempDir
/user/cloudera/dataset/tmp
13/01/11 16:37:30 INFO common.AbstractJob: Command line arguments:
{--endPhase=[2147483647], --input=[/user/cloudera/ratings.csv],
--output=[/user/cloudera/dataset], --probePercentage=[0.1],
--startPhase=[0], --tempDir=[/user/cloudera/dataset/tmp],
--trainingPercentage=[0.9]}
13/01/11 16:37:30 WARN conf.Configuration: mapred.input.dir is deprecated.
Instead, use mapreduce.input.fileinputformat.inputdir
13/01/11 16:37:30 WARN conf.Configuration: mapred.compress.map.output is
deprecated. Instead, use mapreduce.map.output.compress
13/01/11 16:37:30 WARN conf.Configuration: mapred.output.dir is deprecated.
Instead, use mapreduce.output.fileoutputformat.outputdir
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found
interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.apache.mahout.common.HadoopUtil.getCustomJobName(HadoopUtil.java:166)
at org.apache.mahout.common.AbstractJob.prepareJob(AbstractJob.java:553)
at
org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter.run(DatasetSplitter.java:85)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at
org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter.main(DatasetSplitter.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

Any help is appreciated.

Thanks,
Ying