You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "Quiroz Hernandez, Andres" <An...@xerox.com> on 2011/02/05 00:36:25 UTC

Running algorithms from within java program

Hello,

I have set up a release version (pre-compiled) of Mahout 0.3 on top of a
hadoop cluster with version 0.20.2 and am able to run mahout algorithms
from the command line without a problem, e.g.:

mahout seq2sparse -i input_dir -o output_dir -wt tf -seq

However, I tried invoking the algorithms within a java program using the
MahoutDriver class in the following way:

MahoutDriver.main(args);

Where args = {"seq2sparse", "-i", "input_dir", "-o", "output_dir",
"-wt", "tf", "-seq"}

This call fails with the message:

11/02/04 18:20:58 ERROR driver.MahoutDriver: MahoutDriver failed with
args: [seq2sparse, -i, input_dir, -o, output_dir, -wt, tf, -seq]

I believe that the problem is that I am not passing all of the jar
dependencies that the mahout driver class needs to run the algorithm,
and that this is taken care of by the mahout run script, but I am not
very familiar with shell scripting and cannot tell exactly how that is
taken care of. If I am correct, please let me know how I can include
those dependencies (all of which I assume are in the $MAHOUT_HOME/lib
folder), either in the arguments or otherwise. If not, please let me
know what is the correct way to start the algorithms from the code.

I also tried using the SparseVectorsFromSequenceFiles class (or any
other algorithm driver class) directly with the corresponding arguments
except for the short name (seq2sparse), and that call fails more
explicitly with a ClassNotFoundException (which is why I concluded
dependencies are the problem).

Thank you for your help,

Andres

Re: Running algorithms from within java program

Posted by Ted Dunning <te...@gmail.com>.
Aside from your issues that you describe I would suggest moving to 0.4 or
even trunk.  A LOT of improvements have happened.

On Fri, Feb 4, 2011 at 3:36 PM, Quiroz Hernandez, Andres <
Andres.QuirozHernandez@xerox.com> wrote:

> Hello,
>
> I have set up a release version (pre-compiled) of Mahout 0.3 on top of a
> hadoop cluster with version 0.20.2 and am able to run mahout algorithms
> from the command line without a problem, e.g.:
>
> mahout seq2sparse -i input_dir -o output_dir -wt tf -seq
>
> However, I tried invoking the algorithms within a java program using the
> MahoutDriver class in the following way:
>
> MahoutDriver.main(args);
>
> Where args = {"seq2sparse", "-i", "input_dir", "-o", "output_dir",
> "-wt", "tf", "-seq"}
>
> This call fails with the message:
>
> 11/02/04 18:20:58 ERROR driver.MahoutDriver: MahoutDriver failed with
> args: [seq2sparse, -i, input_dir, -o, output_dir, -wt, tf, -seq]
>
> I believe that the problem is that I am not passing all of the jar
> dependencies that the mahout driver class needs to run the algorithm,
> and that this is taken care of by the mahout run script, but I am not
> very familiar with shell scripting and cannot tell exactly how that is
> taken care of. If I am correct, please let me know how I can include
> those dependencies (all of which I assume are in the $MAHOUT_HOME/lib
> folder), either in the arguments or otherwise. If not, please let me
> know what is the correct way to start the algorithms from the code.
>
> I also tried using the SparseVectorsFromSequenceFiles class (or any
> other algorithm driver class) directly with the corresponding arguments
> except for the short name (seq2sparse), and that call fails more
> explicitly with a ClassNotFoundException (which is why I concluded
> dependencies are the problem).
>
> Thank you for your help,
>
> Andres
>