You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@maven.apache.org by Robert Metzger <rm...@apache.org> on 2015/02/24 13:53:52 UTC

Build fat jar transitively excluding some dependencies

Hi,

I'm a committer at the Apache Flink project (a system for distrib. data
processing).
We provide our users a quickstart maven archetype to bootstrap new Flink
jobs.

For the generated Flink job's maven project, I would like to build a
fat-jar that contains all the dependencies the user added to the project.
However, I don't want to include the flink dependencies into the fat jar.
The purpose of the fat-jar is to submit it to the cluster for executing the
user's job. So it should contain the usercode, all the user's dependencies
BUT NOT the flink dependencies, because we can assume them to be available
in the running cluster.

A fat-jar with Flink's dependencies is 60MB+, which can be annoying when
uploading the jars to a cluster.


I'm using the shade plugin to do that.

So my first idea was to exclude everything in the "org.apache.flink"
groupId from the fat jar.
However, this is not possible because
- we can only expect some artifacts to be available at runtime (Flink ships
the core jars with the binary builds. "Extensions" have to be loaded by the
user)
- If users put code in their archetype project into the "org.apache.flink"
namespace, we exclude usercode.

So what I'm looking for is a way to tell the shade (or maven assembly)
plugin to exclude a list of artifacts and their transitive dependencies.


In case someone asks for it: I can not use the 'provided' scope for the
Flink dependencies because users can also start (and debug) their Flink
jobs locally. Setting the dependencies to 'provided' would tell IDEs like
IntelliJ that the dependencies are not required and the job will fail in
IntelliJ. (If there is a way to set the dependencies only during the
'package' phase to provided, let me know)

I hope somebody here has a solution for us.

Regards,
Robert

Re: Build fat jar transitively excluding some dependencies

Posted by Robert Metzger <rm...@apache.org>.
Cool. Thanks for the pointer.
I'll have a closer look at the mentioned IDEA JIRA.

On Fri, Feb 27, 2015 at 2:52 PM, Ben Podgursky <bp...@gmail.com> wrote:

> Yeah, I understand the problem with the main method.  Fyi, there's an
> active intellij ticket with more discussion about this:
>
> https://youtrack.jetbrains.com/issue/IDEA-107048
>
> Another kind of awkward workaround is to create a maven run configuration
> with the desired scope:
>
>
> https://www.jetbrains.com/idea/help/creating-maven-run-debug-configuration.html
>
> (so the run configuration args would be "exec:java
> -Dexec.mainClass=classname -Dexec.classpathScope=test")
>
> On Fri, Feb 27, 2015 at 1:13 AM, Robert Metzger <rm...@apache.org>
> wrote:
>
> > Thank you for the replies.
> >
> > Martin: Will the maven-assemby-plugin also exclude transitive
> dependencies?
> >
> > Ben: Our users are used to run Flink programs out of the main() method,
> so
> > IntelliJ will not include the required jars into the classpath.
> > I basically want some dependencies to be 'provided' when the
> > maven-assembly-plugin runs but otherwise they should be in the 'compile'
> > scope.
> >
> > On Tue, Feb 24, 2015 at 8:51 PM, Ben Podgursky <bp...@gmail.com>
> > wrote:
> >
> > > We package up our job jars using maven assemblies using the 'provided'
> > > scope to exclude hadoop jars, and use intellij for local development
> and
> > > testing.  We've found that it's easiest to just do all local debugging
> > > using junit tests since provided jars will be on the classpath there
> (if
> > > you don't want it to be run during actual unit testing you can @Ignore
> > the
> > > class).
> > >
> > > Not super elegant but it works and encourages people to do testing via
> > > actual tests, rather than manual scripts.
> > >
> > > On Tue, Feb 24, 2015 at 4:53 AM, Robert Metzger <rm...@apache.org>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm a committer at the Apache Flink project (a system for distrib.
> data
> > > > processing).
> > > > We provide our users a quickstart maven archetype to bootstrap new
> > Flink
> > > > jobs.
> > > >
> > > > For the generated Flink job's maven project, I would like to build a
> > > > fat-jar that contains all the dependencies the user added to the
> > project.
> > > > However, I don't want to include the flink dependencies into the fat
> > jar.
> > > > The purpose of the fat-jar is to submit it to the cluster for
> executing
> > > the
> > > > user's job. So it should contain the usercode, all the user's
> > > dependencies
> > > > BUT NOT the flink dependencies, because we can assume them to be
> > > available
> > > > in the running cluster.
> > > >
> > > > A fat-jar with Flink's dependencies is 60MB+, which can be annoying
> > when
> > > > uploading the jars to a cluster.
> > > >
> > > >
> > > > I'm using the shade plugin to do that.
> > > >
> > > > So my first idea was to exclude everything in the "org.apache.flink"
> > > > groupId from the fat jar.
> > > > However, this is not possible because
> > > > - we can only expect some artifacts to be available at runtime (Flink
> > > ships
> > > > the core jars with the binary builds. "Extensions" have to be loaded
> by
> > > the
> > > > user)
> > > > - If users put code in their archetype project into the
> > > "org.apache.flink"
> > > > namespace, we exclude usercode.
> > > >
> > > > So what I'm looking for is a way to tell the shade (or maven
> assembly)
> > > > plugin to exclude a list of artifacts and their transitive
> > dependencies.
> > > >
> > > >
> > > > In case someone asks for it: I can not use the 'provided' scope for
> the
> > > > Flink dependencies because users can also start (and debug) their
> Flink
> > > > jobs locally. Setting the dependencies to 'provided' would tell IDEs
> > like
> > > > IntelliJ that the dependencies are not required and the job will fail
> > in
> > > > IntelliJ. (If there is a way to set the dependencies only during the
> > > > 'package' phase to provided, let me know)
> > > >
> > > > I hope somebody here has a solution for us.
> > > >
> > > > Regards,
> > > > Robert
> > > >
> > >
> >
>

Re: Build fat jar transitively excluding some dependencies

Posted by Ben Podgursky <bp...@gmail.com>.
Yeah, I understand the problem with the main method.  Fyi, there's an
active intellij ticket with more discussion about this:

https://youtrack.jetbrains.com/issue/IDEA-107048

Another kind of awkward workaround is to create a maven run configuration
with the desired scope:

https://www.jetbrains.com/idea/help/creating-maven-run-debug-configuration.html

(so the run configuration args would be "exec:java
-Dexec.mainClass=classname -Dexec.classpathScope=test")

On Fri, Feb 27, 2015 at 1:13 AM, Robert Metzger <rm...@apache.org> wrote:

> Thank you for the replies.
>
> Martin: Will the maven-assemby-plugin also exclude transitive dependencies?
>
> Ben: Our users are used to run Flink programs out of the main() method, so
> IntelliJ will not include the required jars into the classpath.
> I basically want some dependencies to be 'provided' when the
> maven-assembly-plugin runs but otherwise they should be in the 'compile'
> scope.
>
> On Tue, Feb 24, 2015 at 8:51 PM, Ben Podgursky <bp...@gmail.com>
> wrote:
>
> > We package up our job jars using maven assemblies using the 'provided'
> > scope to exclude hadoop jars, and use intellij for local development and
> > testing.  We've found that it's easiest to just do all local debugging
> > using junit tests since provided jars will be on the classpath there (if
> > you don't want it to be run during actual unit testing you can @Ignore
> the
> > class).
> >
> > Not super elegant but it works and encourages people to do testing via
> > actual tests, rather than manual scripts.
> >
> > On Tue, Feb 24, 2015 at 4:53 AM, Robert Metzger <rm...@apache.org>
> > wrote:
> >
> > > Hi,
> > >
> > > I'm a committer at the Apache Flink project (a system for distrib. data
> > > processing).
> > > We provide our users a quickstart maven archetype to bootstrap new
> Flink
> > > jobs.
> > >
> > > For the generated Flink job's maven project, I would like to build a
> > > fat-jar that contains all the dependencies the user added to the
> project.
> > > However, I don't want to include the flink dependencies into the fat
> jar.
> > > The purpose of the fat-jar is to submit it to the cluster for executing
> > the
> > > user's job. So it should contain the usercode, all the user's
> > dependencies
> > > BUT NOT the flink dependencies, because we can assume them to be
> > available
> > > in the running cluster.
> > >
> > > A fat-jar with Flink's dependencies is 60MB+, which can be annoying
> when
> > > uploading the jars to a cluster.
> > >
> > >
> > > I'm using the shade plugin to do that.
> > >
> > > So my first idea was to exclude everything in the "org.apache.flink"
> > > groupId from the fat jar.
> > > However, this is not possible because
> > > - we can only expect some artifacts to be available at runtime (Flink
> > ships
> > > the core jars with the binary builds. "Extensions" have to be loaded by
> > the
> > > user)
> > > - If users put code in their archetype project into the
> > "org.apache.flink"
> > > namespace, we exclude usercode.
> > >
> > > So what I'm looking for is a way to tell the shade (or maven assembly)
> > > plugin to exclude a list of artifacts and their transitive
> dependencies.
> > >
> > >
> > > In case someone asks for it: I can not use the 'provided' scope for the
> > > Flink dependencies because users can also start (and debug) their Flink
> > > jobs locally. Setting the dependencies to 'provided' would tell IDEs
> like
> > > IntelliJ that the dependencies are not required and the job will fail
> in
> > > IntelliJ. (If there is a way to set the dependencies only during the
> > > 'package' phase to provided, let me know)
> > >
> > > I hope somebody here has a solution for us.
> > >
> > > Regards,
> > > Robert
> > >
> >
>

Re: Build fat jar transitively excluding some dependencies

Posted by Robert Metzger <rm...@apache.org>.
Thank you for the replies.

Martin: Will the maven-assemby-plugin also exclude transitive dependencies?

Ben: Our users are used to run Flink programs out of the main() method, so
IntelliJ will not include the required jars into the classpath.
I basically want some dependencies to be 'provided' when the
maven-assembly-plugin runs but otherwise they should be in the 'compile'
scope.

On Tue, Feb 24, 2015 at 8:51 PM, Ben Podgursky <bp...@gmail.com> wrote:

> We package up our job jars using maven assemblies using the 'provided'
> scope to exclude hadoop jars, and use intellij for local development and
> testing.  We've found that it's easiest to just do all local debugging
> using junit tests since provided jars will be on the classpath there (if
> you don't want it to be run during actual unit testing you can @Ignore the
> class).
>
> Not super elegant but it works and encourages people to do testing via
> actual tests, rather than manual scripts.
>
> On Tue, Feb 24, 2015 at 4:53 AM, Robert Metzger <rm...@apache.org>
> wrote:
>
> > Hi,
> >
> > I'm a committer at the Apache Flink project (a system for distrib. data
> > processing).
> > We provide our users a quickstart maven archetype to bootstrap new Flink
> > jobs.
> >
> > For the generated Flink job's maven project, I would like to build a
> > fat-jar that contains all the dependencies the user added to the project.
> > However, I don't want to include the flink dependencies into the fat jar.
> > The purpose of the fat-jar is to submit it to the cluster for executing
> the
> > user's job. So it should contain the usercode, all the user's
> dependencies
> > BUT NOT the flink dependencies, because we can assume them to be
> available
> > in the running cluster.
> >
> > A fat-jar with Flink's dependencies is 60MB+, which can be annoying when
> > uploading the jars to a cluster.
> >
> >
> > I'm using the shade plugin to do that.
> >
> > So my first idea was to exclude everything in the "org.apache.flink"
> > groupId from the fat jar.
> > However, this is not possible because
> > - we can only expect some artifacts to be available at runtime (Flink
> ships
> > the core jars with the binary builds. "Extensions" have to be loaded by
> the
> > user)
> > - If users put code in their archetype project into the
> "org.apache.flink"
> > namespace, we exclude usercode.
> >
> > So what I'm looking for is a way to tell the shade (or maven assembly)
> > plugin to exclude a list of artifacts and their transitive dependencies.
> >
> >
> > In case someone asks for it: I can not use the 'provided' scope for the
> > Flink dependencies because users can also start (and debug) their Flink
> > jobs locally. Setting the dependencies to 'provided' would tell IDEs like
> > IntelliJ that the dependencies are not required and the job will fail in
> > IntelliJ. (If there is a way to set the dependencies only during the
> > 'package' phase to provided, let me know)
> >
> > I hope somebody here has a solution for us.
> >
> > Regards,
> > Robert
> >
>

Re: Build fat jar transitively excluding some dependencies

Posted by Ben Podgursky <bp...@gmail.com>.
We package up our job jars using maven assemblies using the 'provided'
scope to exclude hadoop jars, and use intellij for local development and
testing.  We've found that it's easiest to just do all local debugging
using junit tests since provided jars will be on the classpath there (if
you don't want it to be run during actual unit testing you can @Ignore the
class).

Not super elegant but it works and encourages people to do testing via
actual tests, rather than manual scripts.

On Tue, Feb 24, 2015 at 4:53 AM, Robert Metzger <rm...@apache.org> wrote:

> Hi,
>
> I'm a committer at the Apache Flink project (a system for distrib. data
> processing).
> We provide our users a quickstart maven archetype to bootstrap new Flink
> jobs.
>
> For the generated Flink job's maven project, I would like to build a
> fat-jar that contains all the dependencies the user added to the project.
> However, I don't want to include the flink dependencies into the fat jar.
> The purpose of the fat-jar is to submit it to the cluster for executing the
> user's job. So it should contain the usercode, all the user's dependencies
> BUT NOT the flink dependencies, because we can assume them to be available
> in the running cluster.
>
> A fat-jar with Flink's dependencies is 60MB+, which can be annoying when
> uploading the jars to a cluster.
>
>
> I'm using the shade plugin to do that.
>
> So my first idea was to exclude everything in the "org.apache.flink"
> groupId from the fat jar.
> However, this is not possible because
> - we can only expect some artifacts to be available at runtime (Flink ships
> the core jars with the binary builds. "Extensions" have to be loaded by the
> user)
> - If users put code in their archetype project into the "org.apache.flink"
> namespace, we exclude usercode.
>
> So what I'm looking for is a way to tell the shade (or maven assembly)
> plugin to exclude a list of artifacts and their transitive dependencies.
>
>
> In case someone asks for it: I can not use the 'provided' scope for the
> Flink dependencies because users can also start (and debug) their Flink
> jobs locally. Setting the dependencies to 'provided' would tell IDEs like
> IntelliJ that the dependencies are not required and the job will fail in
> IntelliJ. (If there is a way to set the dependencies only during the
> 'package' phase to provided, let me know)
>
> I hope somebody here has a solution for us.
>
> Regards,
> Robert
>