You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "Aquil H. Abdullah" <aq...@gmail.com> on 2011/08/01 18:11:27 UTC

Using -libjar option

Hello All,

I am new to Hadoop, and I am trying to use the GenericOptionsParser Class.
In particular, I would like to use the -libjar option to specify additional
jar files to include in the classpath. I've created a class that extends
Configured and Implements Tool:

*public class* OptionDemo *extends* Configured *implements* Tool

{

    ...

*    public int* run(String[] args) *throws* Exception

    {

        Configuration conf = getConf();

        GenericOptionsParser opts = *new* GenericOptionsParser(conf, args);

        ...

    }

}


However, when I run my code the jar files that I include after -libjar
aren't being added to the classpath and I receive an error that certain
classes can't be found during the execution of my job.

The book Hadoop: The Definitive Guide states:

You don’t usually use GenericOptionsParser directly, as it’s more convenient
to implement the Tool interface and run your application with the
ToolRunner, which uses GenericOptionsParser internally:
public interface Tool extends Configurable {
    int run(String [] args) throws Exception;
}

but it still isn't clear to me how the -libjars option is parsed, whether or
not I need to explicitly add it to the classpath inside my run method, or
where it needs to be placed in the command-line? Any advice or sample code
on using -libjar would greatly be appreciated.

-- 
Aquil H. Abdullah
aquil.abdullah@gmail.com

Re: Using -libjar option

Posted by John Armstrong <jo...@ccri.com>.
On Mon, 1 Aug 2011 12:11:27 -0400, "Aquil H. Abdullah"
<aq...@gmail.com> wrote:
> but it still isn't clear to me how the -libjars option is parsed,
whether
> or
> not I need to explicitly add it to the classpath inside my run method,
or
> where it needs to be placed in the command-line?

IIRC it's parsed as a comma-separated list of file paths relative to your
current working directory, and the local copies that it makes on each
cluster node are automatically added to the tasks' classpaths.

Can you give an example of how you're trying to use it?

Re: Using -libjar option

Posted by John Armstrong <jo...@ccri.com>.
On Mon, 1 Aug 2011 15:30:49 -0400, "Aquil H. Abdullah"
<aq...@gmail.com> wrote:
> Don't I feel sheepish...

Happens to the best, or so they tell me.

> OK, so I've hacked this sample code below, from the ConfigurationPrinter
> example in Hadoop: The Definitive Guide. If -libjars had been added to
the
> configuration I would expect to see it when I iterate over the urls,
> however
> I see it as one of the remaining options:

It might help you to read over the source code of the ToolRunner class.  I
know it did for me.

Re: Using -libjar option

Posted by "Aquil H. Abdullah" <aq...@gmail.com>.
Don't I feel sheepish...

OK, so I've hacked this sample code below, from the ConfigurationPrinter
example in Hadoop: The Definitive Guide. If -libjars had been added to the
configuration I would expect to see it when I iterate over the urls, however
I see it as one of the remaining options:

***OUTPUT***
remaining args -libjars
remaining args C:\Apps\mahout-distribution-0.5\mahout-core-0.5.jar
***
[Source Code]
package test.option.demo;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.util.*;
// import java.util.*;
import java.net.URL;
// import java.util.Map.Entry;
public class OptionDemo extends Configured implements Tool{
 static
 {
  Configuration.addDefaultResource("hdfs-default.xml");
  Configuration.addDefaultResource("hdfs-site.xml");
  Configuration.addDefaultResource("mapred-default.xml");
  Configuration.addDefaultResource("mapred-site.xml");
 }

 @Override
 public int run(String[] args) throws Exception
 {
  GenericOptionsParser opt = new GenericOptionsParser(args);
  Configuration conf = opt.getConfiguration();
  // for (Entry<String, String> entry: conf)
  // {
  // System.out.printf("%s=%s\n", entry.getKey(), entry.getValue());
  // }

  for (int i = 0; i < args.length;i++)
  {
   System.out.printf("remaining args %s\n", args[i]);
  }

  URL[] urls = GenericOptionsParser.getLibJars(conf);

  if (urls != null)
  {
   for (int j = 0; j < urls.length;j++)
   {
    System.out.printf("url[%d] %s", j, urls[j].toString());
   }else
    System.out.println("No libraries added to configuration");
   }
  }

  return 0;
 }

 public static void main(String[] args) throws Exception
 {
  int exitCode = ToolRunner.run(new OptionDemo(), args);
  System.exit(exitCode);
 }
}



On Mon, Aug 1, 2011 at 2:17 PM, John Armstrong <jo...@ccri.com>wrote:

> On Mon, 1 Aug 2011 13:21:27 -0400, "Aquil H. Abdullah"
> <aq...@gmail.com> wrote:
> > [AA] I am currently invoking my application as follows:
> >
> > hadoop jar /home/test/hadoop/test.option.demo.jar
> > test.option.demo.OptionDemo -libjar /home/test/hadoop/lib/mytestlib.jar
>
> I believe the problem might be that it's looking for "-libjars", not
> "-libjar".
>



-- 
Aquil H. Abdullah
aquil.abdullah@gmail.com

Re: Using -libjar option

Posted by John Armstrong <jo...@ccri.com>.
On Mon, 1 Aug 2011 13:21:27 -0400, "Aquil H. Abdullah"
<aq...@gmail.com> wrote:
> [AA] I am currently invoking my application as follows:
> 
> hadoop jar /home/test/hadoop/test.option.demo.jar
> test.option.demo.OptionDemo -libjar /home/test/hadoop/lib/mytestlib.jar

I believe the problem might be that it's looking for "-libjars", not
"-libjar".

Re: Using -libjar option

Posted by "Aquil H. Abdullah" <aq...@gmail.com>.
[See Response Inline]

I've tried invoking getLib
On Mon, Aug 1, 2011 at 12:56 PM, Harsh J <ha...@cloudera.com> wrote:

> Aquil,
>
> On a side-note, if you use Tool, GenericOptsParser is automatically
> used internally (by ToolRunner), so you don't have to re-parse your
> args in your run(…) method. What you get as run(args) are the remnant
> args alone, if your application handles any.
>
[AA] Thanks for clearing that up!

>
> Would help, as John pointed out, if you could give your exact,
> invoking CLI command.
>

[AA] I am currently invoking my application as follows:

hadoop jar /home/test/hadoop/test.option.demo.jar
test.option.demo.OptionDemo -libjar /home/test/hadoop/lib/mytestlib.jar


>
>

> On Mon, Aug 1, 2011 at 9:41 PM, Aquil H. Abdullah
> <aq...@gmail.com> wrote:
> > Hello All,
> >
> > I am new to Hadoop, and I am trying to use the GenericOptionsParser
> Class.
> > In particular, I would like to use the -libjar option to specify
> additional
> > jar files to include in the classpath. I've created a class that extends
> > Configured and Implements Tool:
> >
> > *public class* OptionDemo *extends* Configured *implements* Tool
> >
> > {
> >
> >    ...
> >
> > *    public int* run(String[] args) *throws* Exception
> >
> >    {
> >
> >        Configuration conf = getConf();
> >
> >        GenericOptionsParser opts = *new* GenericOptionsParser(conf,
> args);
> >
> >        ...
> >
> >    }
> >
> > }
> >
> >
> > However, when I run my code the jar files that I include after -libjar
> > aren't being added to the classpath and I receive an error that certain
> > classes can't be found during the execution of my job.
> >
> > The book Hadoop: The Definitive Guide states:
> >
> > You don’t usually use GenericOptionsParser directly, as it’s more
> convenient
> > to implement the Tool interface and run your application with the
> > ToolRunner, which uses GenericOptionsParser internally:
> > public interface Tool extends Configurable {
> >    int run(String [] args) throws Exception;
> > }
> >
> > but it still isn't clear to me how the -libjars option is parsed, whether
> or
> > not I need to explicitly add it to the classpath inside my run method, or
> > where it needs to be placed in the command-line? Any advice or sample
> code
> > on using -libjar would greatly be appreciated.
> >
> > --
> > Aquil H. Abdullah
> > aquil.abdullah@gmail.com
> >
>
>
>
> --
> Harsh J
>



-- 
Aquil H. Abdullah
aquil.abdullah@gmail.com

Re: Using -libjar option

Posted by Harsh J <ha...@cloudera.com>.
Aquil,

On a side-note, if you use Tool, GenericOptsParser is automatically
used internally (by ToolRunner), so you don't have to re-parse your
args in your run(…) method. What you get as run(args) are the remnant
args alone, if your application handles any.

Would help, as John pointed out, if you could give your exact,
invoking CLI command.

On Mon, Aug 1, 2011 at 9:41 PM, Aquil H. Abdullah
<aq...@gmail.com> wrote:
> Hello All,
>
> I am new to Hadoop, and I am trying to use the GenericOptionsParser Class.
> In particular, I would like to use the -libjar option to specify additional
> jar files to include in the classpath. I've created a class that extends
> Configured and Implements Tool:
>
> *public class* OptionDemo *extends* Configured *implements* Tool
>
> {
>
>    ...
>
> *    public int* run(String[] args) *throws* Exception
>
>    {
>
>        Configuration conf = getConf();
>
>        GenericOptionsParser opts = *new* GenericOptionsParser(conf, args);
>
>        ...
>
>    }
>
> }
>
>
> However, when I run my code the jar files that I include after -libjar
> aren't being added to the classpath and I receive an error that certain
> classes can't be found during the execution of my job.
>
> The book Hadoop: The Definitive Guide states:
>
> You don’t usually use GenericOptionsParser directly, as it’s more convenient
> to implement the Tool interface and run your application with the
> ToolRunner, which uses GenericOptionsParser internally:
> public interface Tool extends Configurable {
>    int run(String [] args) throws Exception;
> }
>
> but it still isn't clear to me how the -libjars option is parsed, whether or
> not I need to explicitly add it to the classpath inside my run method, or
> where it needs to be placed in the command-line? Any advice or sample code
> on using -libjar would greatly be appreciated.
>
> --
> Aquil H. Abdullah
> aquil.abdullah@gmail.com
>



-- 
Harsh J