You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Thomas Weise (JIRA)" <ji...@apache.org> on 2017/05/12 14:06:04 UTC

[jira] [Commented] (BEAM-2270) Examples archetype bundles Hadoop 2.6 in its jar for ApexRunner; cannot run on Hadoop 2.7?

    [ https://issues.apache.org/jira/browse/BEAM-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008174#comment-16008174 ] 

Thomas Weise commented on BEAM-2270:
------------------------------------

The Apex runner is actually setup to do the right thing if the packaging allows for it: It will exclude the user supplied Hadoop dependencies and use the Hadoop install instead. This is what happens in the example project. So even though you launch with Hadoop 2.6 in the client classpath, these dependencies are not shipped to the cluster, they are stripped by the launcher. 

Native Apex launcher follows a similar approach where Hadoop dependencies are not included into the application package (.apa archive that contains all other dependencies).

With shading, all bets are off because we now have a jumb jar that also contains Hadoop that will be shipped to the cluster. 


> Examples archetype bundles Hadoop 2.6 in its jar for ApexRunner; cannot run on Hadoop 2.7?
> ------------------------------------------------------------------------------------------
>
>                 Key: BEAM-2270
>                 URL: https://issues.apache.org/jira/browse/BEAM-2270
>             Project: Beam
>          Issue Type: Bug
>          Components: examples-java, runner-apex, sdk-java-extensions
>            Reporter: Kenneth Knowles
>            Assignee: Thomas Weise
>
> In an instantiated examples archetype, with {{-P apex-runner}}, Apex depends on Hadoop 2.6.0 and this is bundles into the examples jar.
> In order to get this to run on Hadoop 2.7.3 I added this to the profile:
> {code}
>       <properties>
>         <hadoop.version>2.7.3</hadoop.version>
>       </properties>
>  
>       <dependencies>
>         <dependency>
>           <groupId>org.apache.hadoop</groupId>
>           <artifactId>hadoop-yarn-client</artifactId>
>           <version>${hadoop.version}</version>
>         </dependency>
>         <dependency>
>           <groupId>org.apache.hadoop</groupId>
>           <artifactId>hadoop-common</artifactId>
>           <version>${hadoop.version}</version>
>         </dependency>
>       </dependencies>
> {code}
> It is not clear to me what the best path is, here. Clearly the way we bundle is brittle and probably not the recommended best practice. But also perhaps the deps of the runner can be modified to {{provided}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)