You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Dmitriy Lyubimov <dl...@gmail.com> on 2010/12/06 00:17:41 UTC

Command line integration question

Dear all,

I am testing the command line integration for the SSVD patch in hadoop mode
and running into some difficulties.
Even that I defined $HADDOP_HOME and $HADOOP_CONF_DIR, apparently dfs
configuration is not being picked up.

I do run on CDH3b3, however, all hadoop configuration is 100% compatible
with 0.20. I am using AbstractJob.getConf() to acquire initial properties
but it looks like fs.default.name is not being set still. And i tried to
locate theh code that loads that hadoop conf but wasn't immediately able to
find it. Could you please help me what i need to do to retrieve initial
hadoop configuration correctly? I am missing something very simple here.

Thank you in advance.
-Dmitriy

bin/mahout ssvd -i /mahout/ssvdtest/A -o /mahout/ssvd-out/1 -k 100 -p 100 -r
200
Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
HADOOP_CONF_DIR=/home/dmitriy/tools/hadoop/conf
10/12/05 15:09:55 INFO common.AbstractJob: Command line arguments:
{--blockHeight=200, --computeU=true, --computeV=true, --endPhase=2147483647,
--input=/mahout/ssvdtest/A, --minSplitSize=-1, --output=/mahout/ssvd-out/1,
--oversampling=100, --rank=100, --reduceTasks=1, --startPhase=0,
--tempDir=temp}
Exception in thread "main" java.lang.NullPointerException
        at
org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:118)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:110)
        at
org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:177)
        at
org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.run(SSVDCli.java:75)
        at
org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.main(SSVDCli.java:108)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:182)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)

Re: Command line integration question

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Yes it was meant to be mahout's. Honest error, sorry.

apologies for brevity.

Sent from my android.
-Dmitriy
On Dec 6, 2010 3:08 AM, "Lars George" <la...@gmail.com> wrote:

Re: Command line integration question

Posted by Lars George <la...@gmail.com>.
Hi Dmitriy,

I think you sent this to the wrong list? You sent to hbase-user but
this is a Mahout related question. Please check.

Lars

On Mon, Dec 6, 2010 at 12:17 AM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> Dear all,
>
> I am testing the command line integration for the SSVD patch in hadoop mode
> and running into some difficulties.
> Even that I defined $HADDOP_HOME and $HADOOP_CONF_DIR, apparently dfs
> configuration is not being picked up.
>
> I do run on CDH3b3, however, all hadoop configuration is 100% compatible
> with 0.20. I am using AbstractJob.getConf() to acquire initial properties
> but it looks like fs.default.name is not being set still. And i tried to
> locate theh code that loads that hadoop conf but wasn't immediately able to
> find it. Could you please help me what i need to do to retrieve initial
> hadoop configuration correctly? I am missing something very simple here.
>
> Thank you in advance.
> -Dmitriy
>
> bin/mahout ssvd -i /mahout/ssvdtest/A -o /mahout/ssvd-out/1 -k 100 -p 100 -r
> 200
> Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
> HADOOP_CONF_DIR=/home/dmitriy/tools/hadoop/conf
> 10/12/05 15:09:55 INFO common.AbstractJob: Command line arguments:
> {--blockHeight=200, --computeU=true, --computeV=true, --endPhase=2147483647,
> --input=/mahout/ssvdtest/A, --minSplitSize=-1, --output=/mahout/ssvd-out/1,
> --oversampling=100, --rank=100, --reduceTasks=1, --startPhase=0,
> --tempDir=temp}
> Exception in thread "main" java.lang.NullPointerException
>        at
> org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:118)
>        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:110)
>        at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:177)
>        at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.run(SSVDCli.java:75)
>        at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.main(SSVDCli.java:108)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>        at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:182)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>

Re: Command line integration question

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Ok, i think i got it. Mahout uses standard ToolRunner to preconfigure the
client. Got it. Thanks.

On Sun, Dec 5, 2010 at 3:28 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> PS. also if I needed to play with various MR settings, such as child
> processes arguments, could i pass that on to Configuration object thru a
> command line? Or i would have to add a definition for a custom job setting
> for every instance where i'd want to supply a custom MR setting?
>
>
> Thank you in advance.
> -Dmitriy
>
>
>
> On Sun, Dec 5, 2010 at 3:17 PM, Dmitriy Lyubimov <dl...@gmail.com>wrote:
>
>> Dear all,
>>
>> I am testing the command line integration for the SSVD patch in hadoop
>> mode and running into some difficulties.
>> Even that I defined $HADDOP_HOME and $HADOOP_CONF_DIR, apparently dfs
>> configuration is not being picked up.
>>
>> I do run on CDH3b3, however, all hadoop configuration is 100% compatible
>> with 0.20. I am using AbstractJob.getConf() to acquire initial properties
>> but it looks like fs.default.name is not being set still. And i tried to
>> locate theh code that loads that hadoop conf but wasn't immediately able to
>> find it. Could you please help me what i need to do to retrieve initial
>> hadoop configuration correctly? I am missing something very simple here.
>>
>> Thank you in advance.
>> -Dmitriy
>>
>> bin/mahout ssvd -i /mahout/ssvdtest/A -o /mahout/ssvd-out/1 -k 100 -p 100
>> -r 200
>> Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
>> HADOOP_CONF_DIR=/home/dmitriy/tools/hadoop/conf
>> 10/12/05 15:09:55 INFO common.AbstractJob: Command line arguments:
>> {--blockHeight=200, --computeU=true, --computeV=true, --endPhase=2147483647,
>> --input=/mahout/ssvdtest/A, --minSplitSize=-1, --output=/mahout/ssvd-out/1,
>> --oversampling=100, --rank=100, --reduceTasks=1, --startPhase=0,
>> --tempDir=temp}
>> Exception in thread "main" java.lang.NullPointerException
>>         at
>> org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:118)
>>         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:110)
>>         at
>> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:177)
>>         at
>> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.run(SSVDCli.java:75)
>>         at
>> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.main(SSVDCli.java:108)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>         at
>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>         at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:182)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>>
>>
>

Re: Command line integration question

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Thank you, Sean.

On Sun, Dec 5, 2010 at 3:38 PM, Sean Owen <sr...@gmail.com> wrote:

> fs.default.name is conventionally configured in conf/core-site.xml. This
> is
> environment-specific so it can't really be configured in Mahout code.
>
> (But you can manipulate the Job you get from prepareJob() by calling
> getConfiguration()).
>
> You should be able to pass more key-value pairs as you like via JVM system
> properties -- that is -- "-Dmapred.input.dir=..."
>
> On Sun, Dec 5, 2010 at 11:28 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
>
> > PS. also if I needed to play with various MR settings, such as child
> > processes arguments, could i pass that on to Configuration object thru a
> > command line? Or i would have to add a definition for a custom job
> setting
> > for every instance where i'd want to supply a custom MR setting?
> >
> > Thank you in advance.
> > -Dmitriy
> >
> >
> >
> > On Sun, Dec 5, 2010 at 3:17 PM, Dmitriy Lyubimov <dl...@gmail.com>
> > wrote:
> >
> > > Dear all,
> > >
> > > I am testing the command line integration for the SSVD patch in hadoop
> > mode
> > > and running into some difficulties.
> > > Even that I defined $HADDOP_HOME and $HADOOP_CONF_DIR, apparently dfs
> > > configuration is not being picked up.
> > >
> > > I do run on CDH3b3, however, all hadoop configuration is 100%
> compatible
> > > with 0.20. I am using AbstractJob.getConf() to acquire initial
> properties
> > > but it looks like fs.default.name is not being set still. And i tried
> to
> > > locate theh code that loads that hadoop conf but wasn't immediately
> able
> > to
> > > find it. Could you please help me what i need to do to retrieve initial
> > > hadoop configuration correctly? I am missing something very simple
> here.
> > >
> > > Thank you in advance.
> > > -Dmitriy
> > >
> > > bin/mahout ssvd -i /mahout/ssvdtest/A -o /mahout/ssvd-out/1 -k 100 -p
> 100
> > > -r 200
> > > Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
> > > HADOOP_CONF_DIR=/home/dmitriy/tools/hadoop/conf
> > > 10/12/05 15:09:55 INFO common.AbstractJob: Command line arguments:
> > > {--blockHeight=200, --computeU=true, --computeV=true,
> > --endPhase=2147483647,
> > > --input=/mahout/ssvdtest/A, --minSplitSize=-1,
> > --output=/mahout/ssvd-out/1,
> > > --oversampling=100, --rank=100, --reduceTasks=1, --startPhase=0,
> > > --tempDir=temp}
> > > Exception in thread "main" java.lang.NullPointerException
> > >         at
> > > org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:118)
> > >         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:110)
> > >         at
> > >
> >
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:177)
> > >         at
> > >
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.run(SSVDCli.java:75)
> > >         at
> > >
> >
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.main(SSVDCli.java:108)
> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >         at
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >         at
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >         at java.lang.reflect.Method.invoke(Method.java:597)
> > >         at
> > >
> >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > >         at
> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > >         at
> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:182)
> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >         at
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >         at
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >         at java.lang.reflect.Method.invoke(Method.java:597)
> > >         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> > >
> > >
> >
>

Re: Command line integration question

Posted by Sean Owen <sr...@gmail.com>.
fs.default.name is conventionally configured in conf/core-site.xml. This is
environment-specific so it can't really be configured in Mahout code.

(But you can manipulate the Job you get from prepareJob() by calling
getConfiguration()).

You should be able to pass more key-value pairs as you like via JVM system
properties -- that is -- "-Dmapred.input.dir=..."

On Sun, Dec 5, 2010 at 11:28 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> PS. also if I needed to play with various MR settings, such as child
> processes arguments, could i pass that on to Configuration object thru a
> command line? Or i would have to add a definition for a custom job setting
> for every instance where i'd want to supply a custom MR setting?
>
> Thank you in advance.
> -Dmitriy
>
>
>
> On Sun, Dec 5, 2010 at 3:17 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
>
> > Dear all,
> >
> > I am testing the command line integration for the SSVD patch in hadoop
> mode
> > and running into some difficulties.
> > Even that I defined $HADDOP_HOME and $HADOOP_CONF_DIR, apparently dfs
> > configuration is not being picked up.
> >
> > I do run on CDH3b3, however, all hadoop configuration is 100% compatible
> > with 0.20. I am using AbstractJob.getConf() to acquire initial properties
> > but it looks like fs.default.name is not being set still. And i tried to
> > locate theh code that loads that hadoop conf but wasn't immediately able
> to
> > find it. Could you please help me what i need to do to retrieve initial
> > hadoop configuration correctly? I am missing something very simple here.
> >
> > Thank you in advance.
> > -Dmitriy
> >
> > bin/mahout ssvd -i /mahout/ssvdtest/A -o /mahout/ssvd-out/1 -k 100 -p 100
> > -r 200
> > Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
> > HADOOP_CONF_DIR=/home/dmitriy/tools/hadoop/conf
> > 10/12/05 15:09:55 INFO common.AbstractJob: Command line arguments:
> > {--blockHeight=200, --computeU=true, --computeV=true,
> --endPhase=2147483647,
> > --input=/mahout/ssvdtest/A, --minSplitSize=-1,
> --output=/mahout/ssvd-out/1,
> > --oversampling=100, --rank=100, --reduceTasks=1, --startPhase=0,
> > --tempDir=temp}
> > Exception in thread "main" java.lang.NullPointerException
> >         at
> > org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:118)
> >         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:110)
> >         at
> >
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:177)
> >         at
> > org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.run(SSVDCli.java:75)
> >         at
> >
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.main(SSVDCli.java:108)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >         at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >         at
> >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> >         at
> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >         at
> > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:182)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >         at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> >
> >
>

Re: Command line integration question

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Ok, i think i got it. Mahout uses standard ToolRunner to preconfigure the
client. Got it. Thanks.

On Sun, Dec 5, 2010 at 3:28 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> PS. also if I needed to play with various MR settings, such as child
> processes arguments, could i pass that on to Configuration object thru a
> command line? Or i would have to add a definition for a custom job setting
> for every instance where i'd want to supply a custom MR setting?
>
>
> Thank you in advance.
> -Dmitriy
>
>
>
> On Sun, Dec 5, 2010 at 3:17 PM, Dmitriy Lyubimov <dl...@gmail.com>wrote:
>
>> Dear all,
>>
>> I am testing the command line integration for the SSVD patch in hadoop
>> mode and running into some difficulties.
>> Even that I defined $HADDOP_HOME and $HADOOP_CONF_DIR, apparently dfs
>> configuration is not being picked up.
>>
>> I do run on CDH3b3, however, all hadoop configuration is 100% compatible
>> with 0.20. I am using AbstractJob.getConf() to acquire initial properties
>> but it looks like fs.default.name is not being set still. And i tried to
>> locate theh code that loads that hadoop conf but wasn't immediately able to
>> find it. Could you please help me what i need to do to retrieve initial
>> hadoop configuration correctly? I am missing something very simple here.
>>
>> Thank you in advance.
>> -Dmitriy
>>
>> bin/mahout ssvd -i /mahout/ssvdtest/A -o /mahout/ssvd-out/1 -k 100 -p 100
>> -r 200
>> Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
>> HADOOP_CONF_DIR=/home/dmitriy/tools/hadoop/conf
>> 10/12/05 15:09:55 INFO common.AbstractJob: Command line arguments:
>> {--blockHeight=200, --computeU=true, --computeV=true, --endPhase=2147483647,
>> --input=/mahout/ssvdtest/A, --minSplitSize=-1, --output=/mahout/ssvd-out/1,
>> --oversampling=100, --rank=100, --reduceTasks=1, --startPhase=0,
>> --tempDir=temp}
>> Exception in thread "main" java.lang.NullPointerException
>>         at
>> org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:118)
>>         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:110)
>>         at
>> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:177)
>>         at
>> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.run(SSVDCli.java:75)
>>         at
>> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.main(SSVDCli.java:108)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>         at
>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>         at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:182)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>>
>>
>

Re: Command line integration question

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
PS. also if I needed to play with various MR settings, such as child
processes arguments, could i pass that on to Configuration object thru a
command line? Or i would have to add a definition for a custom job setting
for every instance where i'd want to supply a custom MR setting?

Thank you in advance.
-Dmitriy



On Sun, Dec 5, 2010 at 3:17 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> Dear all,
>
> I am testing the command line integration for the SSVD patch in hadoop mode
> and running into some difficulties.
> Even that I defined $HADDOP_HOME and $HADOOP_CONF_DIR, apparently dfs
> configuration is not being picked up.
>
> I do run on CDH3b3, however, all hadoop configuration is 100% compatible
> with 0.20. I am using AbstractJob.getConf() to acquire initial properties
> but it looks like fs.default.name is not being set still. And i tried to
> locate theh code that loads that hadoop conf but wasn't immediately able to
> find it. Could you please help me what i need to do to retrieve initial
> hadoop configuration correctly? I am missing something very simple here.
>
> Thank you in advance.
> -Dmitriy
>
> bin/mahout ssvd -i /mahout/ssvdtest/A -o /mahout/ssvd-out/1 -k 100 -p 100
> -r 200
> Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
> HADOOP_CONF_DIR=/home/dmitriy/tools/hadoop/conf
> 10/12/05 15:09:55 INFO common.AbstractJob: Command line arguments:
> {--blockHeight=200, --computeU=true, --computeV=true, --endPhase=2147483647,
> --input=/mahout/ssvdtest/A, --minSplitSize=-1, --output=/mahout/ssvd-out/1,
> --oversampling=100, --rank=100, --reduceTasks=1, --startPhase=0,
> --tempDir=temp}
> Exception in thread "main" java.lang.NullPointerException
>         at
> org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:118)
>         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:110)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:177)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.run(SSVDCli.java:75)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.main(SSVDCli.java:108)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:182)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>
>

Re: Command line integration question

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
PS. also if I needed to play with various MR settings, such as child
processes arguments, could i pass that on to Configuration object thru a
command line? Or i would have to add a definition for a custom job setting
for every instance where i'd want to supply a custom MR setting?

Thank you in advance.
-Dmitriy



On Sun, Dec 5, 2010 at 3:17 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> Dear all,
>
> I am testing the command line integration for the SSVD patch in hadoop mode
> and running into some difficulties.
> Even that I defined $HADDOP_HOME and $HADOOP_CONF_DIR, apparently dfs
> configuration is not being picked up.
>
> I do run on CDH3b3, however, all hadoop configuration is 100% compatible
> with 0.20. I am using AbstractJob.getConf() to acquire initial properties
> but it looks like fs.default.name is not being set still. And i tried to
> locate theh code that loads that hadoop conf but wasn't immediately able to
> find it. Could you please help me what i need to do to retrieve initial
> hadoop configuration correctly? I am missing something very simple here.
>
> Thank you in advance.
> -Dmitriy
>
> bin/mahout ssvd -i /mahout/ssvdtest/A -o /mahout/ssvd-out/1 -k 100 -p 100
> -r 200
> Running on hadoop, using HADOOP_HOME=/home/dmitriy/tools/hadoop
> HADOOP_CONF_DIR=/home/dmitriy/tools/hadoop/conf
> 10/12/05 15:09:55 INFO common.AbstractJob: Command line arguments:
> {--blockHeight=200, --computeU=true, --computeV=true, --endPhase=2147483647,
> --input=/mahout/ssvdtest/A, --minSplitSize=-1, --output=/mahout/ssvd-out/1,
> --oversampling=100, --rank=100, --reduceTasks=1, --startPhase=0,
> --tempDir=temp}
> Exception in thread "main" java.lang.NullPointerException
>         at
> org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:118)
>         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:110)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:177)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.run(SSVDCli.java:75)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDCli.main(SSVDCli.java:108)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:182)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>
>