You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Vjeran Marcinko <vj...@email.t-com.hr> on 2013/04/21 09:13:11 UTC

Adding 3rd-party libs in easy way ? (libjars and "fatjar" too cumbersome)

Hi,

 

Can somebody tell me if there's some easy way to specify 3rd party libs for
my MR driver application without having to:

 

1.    Create fat jar by unpackaging all dep libs and packing them again
(which really takes some time for couple of dozen dep libs wit my gradle
fatjar plugin task)

2.    Specify libs individually inside "-libjars" option for Tool - but
that's cumbersome since one has to specify each of them individually and
that means building this string somehow

 

Isn't there some way to specify just some directory, say "libs" on your
local drive, and place lib jars there, and driver configuration to pick them
up? Or just to pack all jars into one jar, but unlike fat jar which requires
unpacking every lib and packing them again, just to nest these jars inside
this new archive?

 

Regards,

Vjeran

 

 


RE: Adding 3rd-party libs in easy way ? (libjars and "fatjar" too cumbersome)

Posted by Vjeran Marcinko <vj...@email.t-com.hr>.
Yes, it's exactly the things I wanted. 
One more thing though - although most of Hadoop MR job examples use "hadoop
jar" command for starting job-submitting apps, I somehow don't like that
"shell"-way, because this way job driver apps  can only be submitted on
machines where hadoop is installed, and I would much more like it to be from
my code (ie. programmatically), so I can execute this job submission from
anywhere (such as having a complete java product somewhere that can submit
jobs on user web request). Also, that way I can submit jobs directly from my
IDE, which is always the best developing environment - especially compared
to this alternative -> having some build scripts that will package the app,
deploy remotely on hadoop machine and execute "hadoop jar" command there
just to see if its working (during development).

But, most of examples found on the web give overly simple case when
programmatically submitting WordCount example, that doesn't rely on any 3rd
party lib. From what I read around, it seems that DistributedCache mechanism
has to be used for that, so I'm asking if anyone have some good complete
example for submitting jobs programmatically with 3rd party jars included?
Moreover, this confusion with multiple MR APIs don't help either. I found
some example from "Hadoop in Practice" book, which uses its own JobHelper
util class to add jars to job config, but it seems it places jar paths into
some "tmpjars" or something like that... In other words, I would like to do
programmatically the same stuff that Tool apps have when using -libjars
option with "hadoop jar" command.

Cheers,
Vjeran

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Sunday, April 21, 2013 2:09 PM
To: <us...@hadoop.apache.org>
Subject: Re: Adding 3rd-party libs in easy way ? (libjars and "fatjar" too
cumbersome)

The MR project supports jars which have a subdirectory lib/ inside it,
carrying all required dependencies. Would that not solve your need?
You don't need to re-pack things, just pack them with the lib/ created
inside with necessary dependencies during the build itself.

On Sun, Apr 21, 2013 at 12:43 PM, Vjeran Marcinko
<vj...@email.t-com.hr> wrote:
> Hi,
>
>
>
> Can somebody tell me if there's some easy way to specify 3rd party 
> libs for my MR driver application without having to:
>
>
>
> 1.    Create fat jar by unpackaging all dep libs and packing them again
> (which really takes some time for couple of dozen dep libs wit my 
> gradle fatjar plugin task)
>
> 2.    Specify libs individually inside "-libjars" option for Tool - but
> that's cumbersome since one has to specify each of them individually 
> and that means building this string somehow
>
>
>
> Isn't there some way to specify just some directory, say "libs" on 
> your local drive, and place lib jars there, and driver configuration 
> to pick them up? Or just to pack all jars into one jar, but unlike fat 
> jar which requires unpacking every lib and packing them again, just to 
> nest these jars inside this new archive?
>
>
>
> Regards,
>
> Vjeran
>
>
>
>



--
Harsh J


RE: Adding 3rd-party libs in easy way ? (libjars and "fatjar" too cumbersome)

Posted by Vjeran Marcinko <vj...@email.t-com.hr>.
Yes, it's exactly the things I wanted. 
One more thing though - although most of Hadoop MR job examples use "hadoop
jar" command for starting job-submitting apps, I somehow don't like that
"shell"-way, because this way job driver apps  can only be submitted on
machines where hadoop is installed, and I would much more like it to be from
my code (ie. programmatically), so I can execute this job submission from
anywhere (such as having a complete java product somewhere that can submit
jobs on user web request). Also, that way I can submit jobs directly from my
IDE, which is always the best developing environment - especially compared
to this alternative -> having some build scripts that will package the app,
deploy remotely on hadoop machine and execute "hadoop jar" command there
just to see if its working (during development).

But, most of examples found on the web give overly simple case when
programmatically submitting WordCount example, that doesn't rely on any 3rd
party lib. From what I read around, it seems that DistributedCache mechanism
has to be used for that, so I'm asking if anyone have some good complete
example for submitting jobs programmatically with 3rd party jars included?
Moreover, this confusion with multiple MR APIs don't help either. I found
some example from "Hadoop in Practice" book, which uses its own JobHelper
util class to add jars to job config, but it seems it places jar paths into
some "tmpjars" or something like that... In other words, I would like to do
programmatically the same stuff that Tool apps have when using -libjars
option with "hadoop jar" command.

Cheers,
Vjeran

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Sunday, April 21, 2013 2:09 PM
To: <us...@hadoop.apache.org>
Subject: Re: Adding 3rd-party libs in easy way ? (libjars and "fatjar" too
cumbersome)

The MR project supports jars which have a subdirectory lib/ inside it,
carrying all required dependencies. Would that not solve your need?
You don't need to re-pack things, just pack them with the lib/ created
inside with necessary dependencies during the build itself.

On Sun, Apr 21, 2013 at 12:43 PM, Vjeran Marcinko
<vj...@email.t-com.hr> wrote:
> Hi,
>
>
>
> Can somebody tell me if there's some easy way to specify 3rd party 
> libs for my MR driver application without having to:
>
>
>
> 1.    Create fat jar by unpackaging all dep libs and packing them again
> (which really takes some time for couple of dozen dep libs wit my 
> gradle fatjar plugin task)
>
> 2.    Specify libs individually inside "-libjars" option for Tool - but
> that's cumbersome since one has to specify each of them individually 
> and that means building this string somehow
>
>
>
> Isn't there some way to specify just some directory, say "libs" on 
> your local drive, and place lib jars there, and driver configuration 
> to pick them up? Or just to pack all jars into one jar, but unlike fat 
> jar which requires unpacking every lib and packing them again, just to 
> nest these jars inside this new archive?
>
>
>
> Regards,
>
> Vjeran
>
>
>
>



--
Harsh J


RE: Adding 3rd-party libs in easy way ? (libjars and "fatjar" too cumbersome)

Posted by Vjeran Marcinko <vj...@email.t-com.hr>.
Yes, it's exactly the things I wanted. 
One more thing though - although most of Hadoop MR job examples use "hadoop
jar" command for starting job-submitting apps, I somehow don't like that
"shell"-way, because this way job driver apps  can only be submitted on
machines where hadoop is installed, and I would much more like it to be from
my code (ie. programmatically), so I can execute this job submission from
anywhere (such as having a complete java product somewhere that can submit
jobs on user web request). Also, that way I can submit jobs directly from my
IDE, which is always the best developing environment - especially compared
to this alternative -> having some build scripts that will package the app,
deploy remotely on hadoop machine and execute "hadoop jar" command there
just to see if its working (during development).

But, most of examples found on the web give overly simple case when
programmatically submitting WordCount example, that doesn't rely on any 3rd
party lib. From what I read around, it seems that DistributedCache mechanism
has to be used for that, so I'm asking if anyone have some good complete
example for submitting jobs programmatically with 3rd party jars included?
Moreover, this confusion with multiple MR APIs don't help either. I found
some example from "Hadoop in Practice" book, which uses its own JobHelper
util class to add jars to job config, but it seems it places jar paths into
some "tmpjars" or something like that... In other words, I would like to do
programmatically the same stuff that Tool apps have when using -libjars
option with "hadoop jar" command.

Cheers,
Vjeran

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Sunday, April 21, 2013 2:09 PM
To: <us...@hadoop.apache.org>
Subject: Re: Adding 3rd-party libs in easy way ? (libjars and "fatjar" too
cumbersome)

The MR project supports jars which have a subdirectory lib/ inside it,
carrying all required dependencies. Would that not solve your need?
You don't need to re-pack things, just pack them with the lib/ created
inside with necessary dependencies during the build itself.

On Sun, Apr 21, 2013 at 12:43 PM, Vjeran Marcinko
<vj...@email.t-com.hr> wrote:
> Hi,
>
>
>
> Can somebody tell me if there's some easy way to specify 3rd party 
> libs for my MR driver application without having to:
>
>
>
> 1.    Create fat jar by unpackaging all dep libs and packing them again
> (which really takes some time for couple of dozen dep libs wit my 
> gradle fatjar plugin task)
>
> 2.    Specify libs individually inside "-libjars" option for Tool - but
> that's cumbersome since one has to specify each of them individually 
> and that means building this string somehow
>
>
>
> Isn't there some way to specify just some directory, say "libs" on 
> your local drive, and place lib jars there, and driver configuration 
> to pick them up? Or just to pack all jars into one jar, but unlike fat 
> jar which requires unpacking every lib and packing them again, just to 
> nest these jars inside this new archive?
>
>
>
> Regards,
>
> Vjeran
>
>
>
>



--
Harsh J


RE: Adding 3rd-party libs in easy way ? (libjars and "fatjar" too cumbersome)

Posted by Vjeran Marcinko <vj...@email.t-com.hr>.
Yes, it's exactly the things I wanted. 
One more thing though - although most of Hadoop MR job examples use "hadoop
jar" command for starting job-submitting apps, I somehow don't like that
"shell"-way, because this way job driver apps  can only be submitted on
machines where hadoop is installed, and I would much more like it to be from
my code (ie. programmatically), so I can execute this job submission from
anywhere (such as having a complete java product somewhere that can submit
jobs on user web request). Also, that way I can submit jobs directly from my
IDE, which is always the best developing environment - especially compared
to this alternative -> having some build scripts that will package the app,
deploy remotely on hadoop machine and execute "hadoop jar" command there
just to see if its working (during development).

But, most of examples found on the web give overly simple case when
programmatically submitting WordCount example, that doesn't rely on any 3rd
party lib. From what I read around, it seems that DistributedCache mechanism
has to be used for that, so I'm asking if anyone have some good complete
example for submitting jobs programmatically with 3rd party jars included?
Moreover, this confusion with multiple MR APIs don't help either. I found
some example from "Hadoop in Practice" book, which uses its own JobHelper
util class to add jars to job config, but it seems it places jar paths into
some "tmpjars" or something like that... In other words, I would like to do
programmatically the same stuff that Tool apps have when using -libjars
option with "hadoop jar" command.

Cheers,
Vjeran

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Sunday, April 21, 2013 2:09 PM
To: <us...@hadoop.apache.org>
Subject: Re: Adding 3rd-party libs in easy way ? (libjars and "fatjar" too
cumbersome)

The MR project supports jars which have a subdirectory lib/ inside it,
carrying all required dependencies. Would that not solve your need?
You don't need to re-pack things, just pack them with the lib/ created
inside with necessary dependencies during the build itself.

On Sun, Apr 21, 2013 at 12:43 PM, Vjeran Marcinko
<vj...@email.t-com.hr> wrote:
> Hi,
>
>
>
> Can somebody tell me if there's some easy way to specify 3rd party 
> libs for my MR driver application without having to:
>
>
>
> 1.    Create fat jar by unpackaging all dep libs and packing them again
> (which really takes some time for couple of dozen dep libs wit my 
> gradle fatjar plugin task)
>
> 2.    Specify libs individually inside "-libjars" option for Tool - but
> that's cumbersome since one has to specify each of them individually 
> and that means building this string somehow
>
>
>
> Isn't there some way to specify just some directory, say "libs" on 
> your local drive, and place lib jars there, and driver configuration 
> to pick them up? Or just to pack all jars into one jar, but unlike fat 
> jar which requires unpacking every lib and packing them again, just to 
> nest these jars inside this new archive?
>
>
>
> Regards,
>
> Vjeran
>
>
>
>



--
Harsh J


Re: Adding 3rd-party libs in easy way ? (libjars and "fatjar" too cumbersome)

Posted by Harsh J <ha...@cloudera.com>.
The MR project supports jars which have a subdirectory lib/ inside it,
carrying all required dependencies. Would that not solve your need?
You don't need to re-pack things, just pack them with the lib/ created
inside with necessary dependencies during the build itself.

On Sun, Apr 21, 2013 at 12:43 PM, Vjeran Marcinko
<vj...@email.t-com.hr> wrote:
> Hi,
>
>
>
> Can somebody tell me if there's some easy way to specify 3rd party libs for
> my MR driver application without having to:
>
>
>
> 1.    Create fat jar by unpackaging all dep libs and packing them again
> (which really takes some time for couple of dozen dep libs wit my gradle
> fatjar plugin task)
>
> 2.    Specify libs individually inside „-libjars“ option for Tool – but
> that's cumbersome since one has to specify each of them individually and
> that means building this string somehow
>
>
>
> Isn't there some way to specify just some directory, say „libs“ on your
> local drive, and place lib jars there, and driver configuration to pick them
> up? Or just to pack all jars into one jar, but unlike fat jar which requires
> unpacking every lib and packing them again, just to nest these jars inside
> this new archive?
>
>
>
> Regards,
>
> Vjeran
>
>
>
>



-- 
Harsh J

Re: Adding 3rd-party libs in easy way ? (libjars and "fatjar" too cumbersome)

Posted by Harsh J <ha...@cloudera.com>.
The MR project supports jars which have a subdirectory lib/ inside it,
carrying all required dependencies. Would that not solve your need?
You don't need to re-pack things, just pack them with the lib/ created
inside with necessary dependencies during the build itself.

On Sun, Apr 21, 2013 at 12:43 PM, Vjeran Marcinko
<vj...@email.t-com.hr> wrote:
> Hi,
>
>
>
> Can somebody tell me if there's some easy way to specify 3rd party libs for
> my MR driver application without having to:
>
>
>
> 1.    Create fat jar by unpackaging all dep libs and packing them again
> (which really takes some time for couple of dozen dep libs wit my gradle
> fatjar plugin task)
>
> 2.    Specify libs individually inside „-libjars“ option for Tool – but
> that's cumbersome since one has to specify each of them individually and
> that means building this string somehow
>
>
>
> Isn't there some way to specify just some directory, say „libs“ on your
> local drive, and place lib jars there, and driver configuration to pick them
> up? Or just to pack all jars into one jar, but unlike fat jar which requires
> unpacking every lib and packing them again, just to nest these jars inside
> this new archive?
>
>
>
> Regards,
>
> Vjeran
>
>
>
>



-- 
Harsh J

Re: Adding 3rd-party libs in easy way ? (libjars and "fatjar" too cumbersome)

Posted by Harsh J <ha...@cloudera.com>.
The MR project supports jars which have a subdirectory lib/ inside it,
carrying all required dependencies. Would that not solve your need?
You don't need to re-pack things, just pack them with the lib/ created
inside with necessary dependencies during the build itself.

On Sun, Apr 21, 2013 at 12:43 PM, Vjeran Marcinko
<vj...@email.t-com.hr> wrote:
> Hi,
>
>
>
> Can somebody tell me if there's some easy way to specify 3rd party libs for
> my MR driver application without having to:
>
>
>
> 1.    Create fat jar by unpackaging all dep libs and packing them again
> (which really takes some time for couple of dozen dep libs wit my gradle
> fatjar plugin task)
>
> 2.    Specify libs individually inside „-libjars“ option for Tool – but
> that's cumbersome since one has to specify each of them individually and
> that means building this string somehow
>
>
>
> Isn't there some way to specify just some directory, say „libs“ on your
> local drive, and place lib jars there, and driver configuration to pick them
> up? Or just to pack all jars into one jar, but unlike fat jar which requires
> unpacking every lib and packing them again, just to nest these jars inside
> this new archive?
>
>
>
> Regards,
>
> Vjeran
>
>
>
>



-- 
Harsh J

Re: Adding 3rd-party libs in easy way ? (libjars and "fatjar" too cumbersome)

Posted by Harsh J <ha...@cloudera.com>.
The MR project supports jars which have a subdirectory lib/ inside it,
carrying all required dependencies. Would that not solve your need?
You don't need to re-pack things, just pack them with the lib/ created
inside with necessary dependencies during the build itself.

On Sun, Apr 21, 2013 at 12:43 PM, Vjeran Marcinko
<vj...@email.t-com.hr> wrote:
> Hi,
>
>
>
> Can somebody tell me if there's some easy way to specify 3rd party libs for
> my MR driver application without having to:
>
>
>
> 1.    Create fat jar by unpackaging all dep libs and packing them again
> (which really takes some time for couple of dozen dep libs wit my gradle
> fatjar plugin task)
>
> 2.    Specify libs individually inside „-libjars“ option for Tool – but
> that's cumbersome since one has to specify each of them individually and
> that means building this string somehow
>
>
>
> Isn't there some way to specify just some directory, say „libs“ on your
> local drive, and place lib jars there, and driver configuration to pick them
> up? Or just to pack all jars into one jar, but unlike fat jar which requires
> unpacking every lib and packing them again, just to nest these jars inside
> this new archive?
>
>
>
> Regards,
>
> Vjeran
>
>
>
>



-- 
Harsh J