You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Nikolaos Romanos Katsipoulakis <po...@gmail.com> on 2012/06/05 08:20:26 UTC

Web Service Interface for triggering a Hadoop Job

Hello everybody.
I want to trigger the execution of an ItemSimilarityJob (mahout 0.7 
snapshot) from a web service
interface. Hence, I want to implement a class that will contain an 
ItemSimilarityJob object and whenever
I get a WS request, it will invoke the ItemSimilarityJob object's run 
method. Is this possible?
And how is it done?
I am posting the code that I have written below:

public class Main {

     public static void main(String[] args) throws IOException {
         Configuration jobConf = new Configuration();
         jobConf.addResource(new Path("/etc/hadoop/conf/core-site.xml"));
         jobConf.addResource(new Path("/etc/hadoop/conf/hdfs-site.xml"));
         jobConf.addResource(new Path("/etc/hadoop/conf/mapred-site.xml"));
         ItemSimilarityJob myJob = new ItemSimilarityJob();
         String[] args1 = { "-Dmapred.input.dir=input/input.txt", 
"-Dmapred.output.dir=output", "--similarityClassname", 
"SIMILARITY_COOCCURRENCE" };
         try {
             myJob.main(args1);
         }catch(Exception e) {
             System.err.println(e.getMessage());
         }
     }

}

The output I get is:

Jun 5, 2012 9:14:46 AM org.apache.mahout.common.AbstractJob parseArguments
SEVERE: Unexpected mapred.output.dir=output while processing 
Job-Specific Options:
usage: <command> [Generic Options] [Job-Specific Options]
Generic Options:
  -archives <paths>              comma separated archives to be unarchived
                                 on the compute machines.
  -conf <configuration file>     specify an application configuration file
  -D <property=value>            use value for given property
  -files <paths>                 comma separated files to be copied to the
                                 map reduce cluster
  -fs <local|namenode:port>      specify a namenode
  -jt <local|jobtracker:port>    specify a job tracker
  -libjars <paths>               comma separated jar files to include in
                                 the classpath.
  -tokenCacheFile <tokensFile>   name of the file with the tokens
Unexpected mapred.output.dir=output while processing Job-Specific Options:
Usage:
  [--input <input> --output <output> --similarityClassname 
<similarityClassname>
--maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefsPerUser
<maxPrefsPerUser> --minPrefsPerUser <minPrefsPerUser> --booleanData
<booleanData> --threshold <threshold> --help --tempDir <tempDir> 
--startPhase
<startPhase> --endPhase <endPhase>]
Job-Specific Options:
   --input (-i) input                                      Path to job 
input
                                                           directory.
   --output (-o) output                                    The directory
                                                           pathname for 
output.
   --similarityClassname (-s) similarityClassname          Name of 
distributed
                                                           similarity 
measures
                                                           class to 
instantiate,
                                                           alternatively 
use one
                                                           of the 
predefined
                                                           similarities
                                                           
([SIMILARITY_COOCCURRE
                                                           NCE,
                                                           
SIMILARITY_LOGLIKELIHO
                                                           OD,
                                                           
SIMILARITY_TANIMOTO_CO
                                                           EFFICIENT,
                                                           
SIMILARITY_CITY_BLOCK,
                                                           
SIMILARITY_COSINE,
                                                           
SIMILARITY_PEARSON_COR
                                                           RELATION,
                                                           
SIMILARITY_EUCLIDEAN_D
                                                           ISTANCE])
   --maxSimilaritiesPerItem (-m) maxSimilaritiesPerItem    try to cap 
the number
                                                           of similar 
items per
                                                           item to this 
number
                                                           (default: 100)
   --maxPrefsPerUser (-mppu) maxPrefsPerUser               max number of
                                                           preferences to
                                                           consider per 
user,
                                                           users with more
                                                           preferences 
will be
                                                           sampled down
                                                           (default: 1000)
   --minPrefsPerUser (-mp) minPrefsPerUser                 ignore users 
with
                                                           less 
preferences than
                                                           this 
(default: 1)
   --booleanData (-b) booleanData                          Treat input as
                                                           without pref 
values
   --threshold (-tr) threshold                             discard item 
pairs
                                                           with a 
similarity
                                                           value below this
   --help (-h)                                             Print out help
   --tempDir tempDir                                       Intermediate 
output
                                                           directory
   --startPhase startPhase                                 First phase 
to run
   --endPhase endPhase                                     Last phase to 
run

Why do I get the above output?

Thank you in advance.

Nick K.

Re: Web Service Interface for triggering a Hadoop Job

Posted by Nitin Pawar <ni...@gmail.com>.
you may want to check this on mahout user group

Jun 5, 2012 9:14:46 AM org.apache.mahout.common.**AbstractJob parseArguments
SEVERE: Unexpected mapred.output.dir=output while processing Job-Specific
Options:

this looks like command line argument parsing error

On Tue, Jun 5, 2012 at 11:50 AM, Nikolaos Romanos Katsipoulakis <
popanik@gmail.com> wrote:

> Hello everybody.
> I want to trigger the execution of an ItemSimilarityJob (mahout 0.7
> snapshot) from a web service
> interface. Hence, I want to implement a class that will contain an
> ItemSimilarityJob object and whenever
> I get a WS request, it will invoke the ItemSimilarityJob object's run
> method. Is this possible?
> And how is it done?
> I am posting the code that I have written below:
>
> public class Main {
>
>    public static void main(String[] args) throws IOException {
>        Configuration jobConf = new Configuration();
>        jobConf.addResource(new Path("/etc/hadoop/conf/core-**site.xml"));
>        jobConf.addResource(new Path("/etc/hadoop/conf/hdfs-**site.xml"));
>        jobConf.addResource(new Path("/etc/hadoop/conf/mapred-**
> site.xml"));
>        ItemSimilarityJob myJob = new ItemSimilarityJob();
>        String[] args1 = { "-Dmapred.input.dir=input/**input.txt",
> "-Dmapred.output.dir=output", "--similarityClassname",
> "SIMILARITY_COOCCURRENCE" };
>        try {
>            myJob.main(args1);
>        }catch(Exception e) {
>            System.err.println(e.**getMessage());
>        }
>    }
>
> }
>
> The output I get is:
>
> Jun 5, 2012 9:14:46 AM org.apache.mahout.common.**AbstractJob
> parseArguments
> SEVERE: Unexpected mapred.output.dir=output while processing Job-Specific
> Options:
> usage: <command> [Generic Options] [Job-Specific Options]
> Generic Options:
>  -archives <paths>              comma separated archives to be unarchived
>                                on the compute machines.
>  -conf <configuration file>     specify an application configuration file
>  -D <property=value>            use value for given property
>  -files <paths>                 comma separated files to be copied to the
>                                map reduce cluster
>  -fs <local|namenode:port>      specify a namenode
>  -jt <local|jobtracker:port>    specify a job tracker
>  -libjars <paths>               comma separated jar files to include in
>                                the classpath.
>  -tokenCacheFile <tokensFile>   name of the file with the tokens
> Unexpected mapred.output.dir=output while processing Job-Specific Options:
> Usage:
>  [--input <input> --output <output> --similarityClassname
> <similarityClassname>
> --maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefsPerUser
> <maxPrefsPerUser> --minPrefsPerUser <minPrefsPerUser> --booleanData
> <booleanData> --threshold <threshold> --help --tempDir <tempDir>
> --startPhase
> <startPhase> --endPhase <endPhase>]
> Job-Specific Options:
>  --input (-i) input                                      Path to job input
>                                                          directory.
>  --output (-o) output                                    The directory
>                                                          pathname for
> output.
>  --similarityClassname (-s) similarityClassname          Name of
> distributed
>                                                          similarity
> measures
>                                                          class to
> instantiate,
>                                                          alternatively use
> one
>                                                          of the predefined
>                                                          similarities
>
>  ([SIMILARITY_COOCCURRE
>                                                          NCE,
>
>  SIMILARITY_LOGLIKELIHO
>                                                          OD,
>
>  SIMILARITY_TANIMOTO_CO
>                                                          EFFICIENT,
>
>  SIMILARITY_CITY_BLOCK,
>                                                          SIMILARITY_COSINE,
>
>  SIMILARITY_PEARSON_COR
>                                                          RELATION,
>
>  SIMILARITY_EUCLIDEAN_D
>                                                          ISTANCE])
>  --maxSimilaritiesPerItem (-m) maxSimilaritiesPerItem    try to cap the
> number
>                                                          of similar items
> per
>                                                          item to this
> number
>                                                          (default: 100)
>  --maxPrefsPerUser (-mppu) maxPrefsPerUser               max number of
>                                                          preferences to
>                                                          consider per user,
>                                                          users with more
>                                                          preferences will
> be
>                                                          sampled down
>                                                          (default: 1000)
>  --minPrefsPerUser (-mp) minPrefsPerUser                 ignore users with
>                                                          less preferences
> than
>                                                          this (default: 1)
>  --booleanData (-b) booleanData                          Treat input as
>                                                          without pref
> values
>  --threshold (-tr) threshold                             discard item pairs
>                                                          with a similarity
>                                                          value below this
>  --help (-h)                                             Print out help
>  --tempDir tempDir                                       Intermediate
> output
>                                                          directory
>  --startPhase startPhase                                 First phase to run
>  --endPhase endPhase                                     Last phase to run
>
> Why do I get the above output?
>
> Thank you in advance.
>
> Nick K.
>



-- 
Nitin Pawar