You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Mark Grover (JIRA)" <ji...@apache.org> on 2015/10/23 22:54:27 UTC
[jira] [Commented] (FLUME-2819) Kafka libs are being bundled into Flume distro

    [ https://issues.apache.org/jira/browse/FLUME-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971830#comment-14971830 ] 

Mark Grover commented on FLUME-2819:
------------------------------------

I am a committer on Apache Bigtop where we do packaging of the Hadoop ecosystem components into a single coherent distribution.

I came across this JIRA and have gone through the related JIRA FLUME-2792. Based on my experience working on Apache Bigtop, I thought I'd share my views on this topic.

I don't think marking things as provided is the right thing to do.  The right thing to do, in my opinion, is to have Flume rely on a new enough version of Apache Kafka that provides those features. If there is no such version today, there is no option but to wait until Kafka releases such a version. If you use are using 'provided', it means that by default, Flume-Kafka integration wouldn't work out of the box. The norm usually is to use the default scope (which is compile) and have the jars bundled in the classpath (and the tarball). And, flume is already doing that with its hadoop and hive dependencies (see [here|https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-dataset-sink/pom.xml#L110] for example).

Now, you may see {{<optional>true</optional>}} there, that's nothing to do with the scope. The scope of that dependency is still compile, however it's optional. Folks usually use optional dependencies in maven when the dependency being included is 'too bulky'. That may be because, say the server and client classes are all bundled in the same jar. And, so if someone (let's call this C) is depending on your project (let's call this B), you don't want to clutter up their classpath because of some transitive server dependencies from the dependency your project is adding (let's call that A). And, the best way to deal with that with a maven optional dependency. When it comes to building your project (B) alone, optional tag has no impact, it's as if it doesn't exist. However, when someone else depends on your project (i.e. project C depending on B), optional dependencies mean that C doesn't pull in the optional and bulky dependency of A transitively. You can read up on optional dependencies [here|http://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html]

So, my recommendation in this case would be - if you are using a bulky kafka dependency (i.e. no kafka client jar), use the default (i.e. compile) scope with optional tag. And, if you are using a kafka-client dependency, simply use the default (i.e. compile) scope.

> Kafka libs are being bundled into Flume distro
> ----------------------------------------------
>
>                 Key: FLUME-2819
>                 URL: https://issues.apache.org/jira/browse/FLUME-2819
>             Project: Flume
>          Issue Type: Bug
>            Reporter: Roshan Naik
>
> Kafka dependency libs need to be marked as 'provided' in the pom.xml 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)