You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by Attila Sasvari via Review Board <no...@reviews.apache.org> on 2017/11/16 13:50:28 UTC

Review Request 63875: OOZIE-2900 Retrieve tokens for oozie.launcher.mapreduce.job.hdfs-servers before submission

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63875/
-----------------------------------------------------------

Review request for oozie, Peter Bacsko and Robert Kanter.


Repository: oozie-git


Description
-------

Before Oozie on YARN, ``JobSubmitter`` from MapReduce (more precisely ``TokenCache.obtainTokensForNamenodes``) took care of obtaining delegation tokens for HDFS nodes specified by ``oozie.launcher.mapreduce.job.hdfs-servers`` before submitting the Oozie launcher job.

Oozie launcher is now a Yarn Application Master. It needs HDFS delegation tokens to be able to copy files between secure clusters via the Oozie DistCp action. 

Changes:
- ``JavaActionExecutor`` was modified to handle Distcp related parameters like (``oozie.launcher.mapreduce.job.hdfs-servers`` and ``oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude``)
- ``HDFSCredentials`` was changed to reuse ``TokenCache.obtainTokensForNamenodes`` to obtain HDFS delegation tokens.


Diffs
-----

  core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java 92a7ebe 
  core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java 8cb76cf 


Diff: https://reviews.apache.org/r/63875/diff/1/


Testing
-------

Tested on a secure cluster that Oozie dist cp action can copy file from another secure cluster where different Kerberos realm was used.

- workflow:
```
<workflow-app name="DistCp" xmlns="uri:oozie:workflow:0.5">
    <start to="distcp-3a1f"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="distcp-3a1f">
        <distcp xmlns="uri:oozie:distcp-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>

<configuration>
  <property>
    <name>oozie.launcher.mapreduce.job.dfs.namenode.kerberos.principal.pattern</name>
    <value>*</value>
  </property>
<property>
  <name>oozie.launcher.mapreduce.job.hdfs-servers</name>
  <value>hdfs://oozie.test1.com:8020,hdfs://remote.test2.com:8020</value>
</property>
 
                <property>
                    <name>oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude</name>
                    <value>remote.test2.com</value>
                </property>
</configuration>
              <arg>hdfs://remote.test2.com:8020/tmp/1</arg>
              <arg>hdfs://oozie.test1.com:8020/tmp/2</arg>
        </distcp>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>
```

Prior to executing the workflow I had to setup cross realm trust between the test secure clusters. It involved:
- changing Kerberos configuration ``/etc/krb5.conf`` (adding realms and setting additional properties like ``udp_preference_limit = 1``)
- regenerating service credentials
- changing HDFS settings (such as ``dfs.namenode.kerberos.principal.pattern``) and setting hadoop auth to local rule like ``RULE:[2:$1](.*)s/(.*)/$1/g``
- additional configuration to enable trust between the test hadoop clusters


Thanks,

Attila Sasvari


Re: Review Request 63875: OOZIE-2900 Retrieve tokens for oozie.launcher.mapreduce.job.hdfs-servers before submission

Posted by Peter Cseh via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63875/#review191334
-----------------------------------------------------------




core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java
Lines 72-76 (patched)
<https://reviews.apache.org/r/63875/#comment269117>

    Can you add some info-level logging or validation of the prperties? E.g. what happens if the JOB_NAMENODES property not null, but empty?



core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java
Lines 110 (patched)
<https://reviews.apache.org/r/63875/#comment269116>

    Please add logging similart to the other code path here.



core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java
Lines 1375 (patched)
<https://reviews.apache.org/r/63875/#comment269119>

    Please create a constant for this as well.



core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java
Lines 1378 (patched)
<https://reviews.apache.org/r/63875/#comment269118>

    You've created a constant for this but haven't used it. Please do so.


- Peter Cseh


On Nov. 16, 2017, 1:50 p.m., Attila Sasvari wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63875/
> -----------------------------------------------------------
> 
> (Updated Nov. 16, 2017, 1:50 p.m.)
> 
> 
> Review request for oozie, Peter Bacsko and Robert Kanter.
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> Before Oozie on YARN, ``JobSubmitter`` from MapReduce (more precisely ``TokenCache.obtainTokensForNamenodes``) took care of obtaining delegation tokens for HDFS nodes specified by ``oozie.launcher.mapreduce.job.hdfs-servers`` before submitting the Oozie launcher job.
> 
> Oozie launcher is now a Yarn Application Master. It needs HDFS delegation tokens to be able to copy files between secure clusters via the Oozie DistCp action. 
> 
> Changes:
> - ``JavaActionExecutor`` was modified to handle Distcp related parameters like (``oozie.launcher.mapreduce.job.hdfs-servers`` and ``oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude``)
> - ``HDFSCredentials`` was changed to reuse ``TokenCache.obtainTokensForNamenodes`` to obtain HDFS delegation tokens.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java 92a7ebe 
>   core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java 8cb76cf 
> 
> 
> Diff: https://reviews.apache.org/r/63875/diff/1/
> 
> 
> Testing
> -------
> 
> Tested on a secure cluster that Oozie dist cp action can copy file from another secure cluster where different Kerberos realm was used.
> 
> - workflow:
> ```
> <workflow-app name="DistCp" xmlns="uri:oozie:workflow:0.5">
>     <start to="distcp-3a1f"/>
>     <kill name="Kill">
>         <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
>     </kill>
>     <action name="distcp-3a1f">
>         <distcp xmlns="uri:oozie:distcp-action:0.1">
>             <job-tracker>${jobTracker}</job-tracker>
>             <name-node>${nameNode}</name-node>
> 
> <configuration>
>   <property>
>     <name>oozie.launcher.mapreduce.job.dfs.namenode.kerberos.principal.pattern</name>
>     <value>*</value>
>   </property>
> <property>
>   <name>oozie.launcher.mapreduce.job.hdfs-servers</name>
>   <value>hdfs://oozie.test1.com:8020,hdfs://remote.test2.com:8020</value>
> </property>
>  
>                 <property>
>                     <name>oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude</name>
>                     <value>remote.test2.com</value>
>                 </property>
> </configuration>
>               <arg>hdfs://remote.test2.com:8020/tmp/1</arg>
>               <arg>hdfs://oozie.test1.com:8020/tmp/2</arg>
>         </distcp>
>         <ok to="End"/>
>         <error to="Kill"/>
>     </action>
>     <end name="End"/>
> </workflow-app>
> ```
> 
> Prior to executing the workflow I had to setup cross realm trust between the test secure clusters. It involved:
> - changing Kerberos configuration ``/etc/krb5.conf`` (adding realms and setting additional properties like ``udp_preference_limit = 1``)
> - regenerating service credentials
> - changing HDFS settings (such as ``dfs.namenode.kerberos.principal.pattern``) and setting hadoop auth to local rule like ``RULE:[2:$1](.*)s/(.*)/$1/g``
> - additional configuration to enable trust between the test hadoop clusters
> 
> 
> Thanks,
> 
> Attila Sasvari
> 
>


Re: Review Request 63875: OOZIE-2900 Retrieve tokens for oozie.launcher.mapreduce.job.hdfs-servers before submission

Posted by Robert Kanter via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63875/#review191285
-----------------------------------------------------------




core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java
Lines 63-66 (patched)
<https://reviews.apache.org/r/63875/#comment269046>

    Given that this stuff is only needed if jobNameNodes is not null, we should move it inside the if block so we don't do it unnecessarily.



core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java
Lines 1404 (patched)
<https://reviews.apache.org/r/63875/#comment269050>

    This probably shouldn't be hardcoded here :)


- Robert Kanter


On Nov. 16, 2017, 1:50 p.m., Attila Sasvari wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63875/
> -----------------------------------------------------------
> 
> (Updated Nov. 16, 2017, 1:50 p.m.)
> 
> 
> Review request for oozie, Peter Bacsko and Robert Kanter.
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> Before Oozie on YARN, ``JobSubmitter`` from MapReduce (more precisely ``TokenCache.obtainTokensForNamenodes``) took care of obtaining delegation tokens for HDFS nodes specified by ``oozie.launcher.mapreduce.job.hdfs-servers`` before submitting the Oozie launcher job.
> 
> Oozie launcher is now a Yarn Application Master. It needs HDFS delegation tokens to be able to copy files between secure clusters via the Oozie DistCp action. 
> 
> Changes:
> - ``JavaActionExecutor`` was modified to handle Distcp related parameters like (``oozie.launcher.mapreduce.job.hdfs-servers`` and ``oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude``)
> - ``HDFSCredentials`` was changed to reuse ``TokenCache.obtainTokensForNamenodes`` to obtain HDFS delegation tokens.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java 92a7ebe 
>   core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java 8cb76cf 
> 
> 
> Diff: https://reviews.apache.org/r/63875/diff/1/
> 
> 
> Testing
> -------
> 
> Tested on a secure cluster that Oozie dist cp action can copy file from another secure cluster where different Kerberos realm was used.
> 
> - workflow:
> ```
> <workflow-app name="DistCp" xmlns="uri:oozie:workflow:0.5">
>     <start to="distcp-3a1f"/>
>     <kill name="Kill">
>         <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
>     </kill>
>     <action name="distcp-3a1f">
>         <distcp xmlns="uri:oozie:distcp-action:0.1">
>             <job-tracker>${jobTracker}</job-tracker>
>             <name-node>${nameNode}</name-node>
> 
> <configuration>
>   <property>
>     <name>oozie.launcher.mapreduce.job.dfs.namenode.kerberos.principal.pattern</name>
>     <value>*</value>
>   </property>
> <property>
>   <name>oozie.launcher.mapreduce.job.hdfs-servers</name>
>   <value>hdfs://oozie.test1.com:8020,hdfs://remote.test2.com:8020</value>
> </property>
>  
>                 <property>
>                     <name>oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude</name>
>                     <value>remote.test2.com</value>
>                 </property>
> </configuration>
>               <arg>hdfs://remote.test2.com:8020/tmp/1</arg>
>               <arg>hdfs://oozie.test1.com:8020/tmp/2</arg>
>         </distcp>
>         <ok to="End"/>
>         <error to="Kill"/>
>     </action>
>     <end name="End"/>
> </workflow-app>
> ```
> 
> Prior to executing the workflow I had to setup cross realm trust between the test secure clusters. It involved:
> - changing Kerberos configuration ``/etc/krb5.conf`` (adding realms and setting additional properties like ``udp_preference_limit = 1``)
> - regenerating service credentials
> - changing HDFS settings (such as ``dfs.namenode.kerberos.principal.pattern``) and setting hadoop auth to local rule like ``RULE:[2:$1](.*)s/(.*)/$1/g``
> - additional configuration to enable trust between the test hadoop clusters
> 
> 
> Thanks,
> 
> Attila Sasvari
> 
>


Re: Review Request 63875: OOZIE-2900 Retrieve tokens for oozie.launcher.mapreduce.job.hdfs-servers before submission

Posted by Peter Cseh via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63875/#review191662
-----------------------------------------------------------




core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java
Lines 1379-1386 (patched)
<https://reviews.apache.org/r/63875/#comment269504>

    Can this part be pushed down to DistcpActionExecutor? It does not feel like other actions would have to work with these properties.
    Also, please cosider adding some logging here, at least on debug level to make it easier to see what's happening.


- Peter Cseh


On Nov. 21, 2017, 3:49 p.m., Attila Sasvari wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63875/
> -----------------------------------------------------------
> 
> (Updated Nov. 21, 2017, 3:49 p.m.)
> 
> 
> Review request for oozie, Peter Bacsko and Robert Kanter.
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> Before Oozie on YARN, ``JobSubmitter`` from MapReduce (more precisely ``TokenCache.obtainTokensForNamenodes``) took care of obtaining delegation tokens for HDFS nodes specified by ``oozie.launcher.mapreduce.job.hdfs-servers`` before submitting the Oozie launcher job.
> 
> Oozie launcher is now a Yarn Application Master. It needs HDFS delegation tokens to be able to copy files between secure clusters via the Oozie DistCp action. 
> 
> Changes:
> - ``JavaActionExecutor`` was modified to handle Distcp related parameters like (``oozie.launcher.mapreduce.job.hdfs-servers`` and ``oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude``)
> - ``HDFSCredentials`` was changed to reuse ``TokenCache.obtainTokensForNamenodes`` to obtain HDFS delegation tokens.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java 92a7ebe9a7876b6400d80356d5c826e77575e2ab 
>   core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java a1df304914b73d406e986409a8053c2a48e1bd38 
> 
> 
> Diff: https://reviews.apache.org/r/63875/diff/3/
> 
> 
> Testing
> -------
> 
> Tested on a secure cluster that Oozie dist cp action can copy file from another secure cluster where different Kerberos realm was used.
> 
> - workflow:
> ```
> <workflow-app name="DistCp" xmlns="uri:oozie:workflow:0.5">
>     <start to="distcp-3a1f"/>
>     <kill name="Kill">
>         <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
>     </kill>
>     <action name="distcp-3a1f">
>         <distcp xmlns="uri:oozie:distcp-action:0.1">
>             <job-tracker>${jobTracker}</job-tracker>
>             <name-node>${nameNode}</name-node>
> 
> <configuration>
>   <property>
>     <name>oozie.launcher.mapreduce.job.dfs.namenode.kerberos.principal.pattern</name>
>     <value>*</value>
>   </property>
> <property>
>   <name>oozie.launcher.mapreduce.job.hdfs-servers</name>
>   <value>hdfs://oozie.test1.com:8020,hdfs://remote.test2.com:8020</value>
> </property>
>  
>                 <property>
>                     <name>oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude</name>
>                     <value>remote.test2.com</value>
>                 </property>
> </configuration>
>               <arg>hdfs://remote.test2.com:8020/tmp/1</arg>
>               <arg>hdfs://oozie.test1.com:8020/tmp/2</arg>
>         </distcp>
>         <ok to="End"/>
>         <error to="Kill"/>
>     </action>
>     <end name="End"/>
> </workflow-app>
> ```
> 
> Prior to executing the workflow I had to setup cross realm trust between the test secure clusters. It involved:
> - changing Kerberos configuration ``/etc/krb5.conf`` (adding realms and setting additional properties like ``udp_preference_limit = 1``)
> - regenerating service credentials
> - changing HDFS settings (such as ``dfs.namenode.kerberos.principal.pattern``) and setting hadoop auth to local rule like ``RULE:[2:$1](.*)s/(.*)/$1/g``
> - additional configuration to enable trust between the test hadoop clusters
> 
> 
> Thanks,
> 
> Attila Sasvari
> 
>


Re: Review Request 63875: OOZIE-2900 Retrieve tokens for oozie.launcher.mapreduce.job.hdfs-servers before submission

Posted by András Piros via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63875/#review191832
-----------------------------------------------------------


Ship it!




Ship It!

- András Piros


On Nov. 24, 2017, 10:29 a.m., Attila Sasvari wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63875/
> -----------------------------------------------------------
> 
> (Updated Nov. 24, 2017, 10:29 a.m.)
> 
> 
> Review request for oozie, Peter Bacsko and Robert Kanter.
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> Before Oozie on YARN, ``JobSubmitter`` from MapReduce (more precisely ``TokenCache.obtainTokensForNamenodes``) took care of obtaining delegation tokens for HDFS nodes specified by ``oozie.launcher.mapreduce.job.hdfs-servers`` before submitting the Oozie launcher job.
> 
> Oozie launcher is now a Yarn Application Master. It needs HDFS delegation tokens to be able to copy files between secure clusters via the Oozie DistCp action. 
> 
> Changes:
> - ``JavaActionExecutor`` was modified to handle Distcp related parameters like (``oozie.launcher.mapreduce.job.hdfs-servers`` and ``oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude``)
> - ``HDFSCredentials`` was changed to reuse ``TokenCache.obtainTokensForNamenodes`` to obtain HDFS delegation tokens.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/action/hadoop/DistcpActionExecutor.java 81e28f722d9ecd0bf972bf2d0a684d207547d165 
>   core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java 92a7ebe9a7876b6400d80356d5c826e77575e2ab 
>   core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java a1df304914b73d406e986409a8053c2a48e1bd38 
> 
> 
> Diff: https://reviews.apache.org/r/63875/diff/6/
> 
> 
> Testing
> -------
> 
> Tested on a secure cluster that Oozie dist cp action can copy file from another secure cluster where different Kerberos realm was used.
> 
> - workflow:
> ```
> <workflow-app name="DistCp" xmlns="uri:oozie:workflow:0.5">
>     <start to="distcp-3a1f"/>
>     <kill name="Kill">
>         <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
>     </kill>
>     <action name="distcp-3a1f">
>         <distcp xmlns="uri:oozie:distcp-action:0.1">
>             <job-tracker>${jobTracker}</job-tracker>
>             <name-node>${nameNode}</name-node>
> 
> <configuration>
>   <property>
>     <name>oozie.launcher.mapreduce.job.dfs.namenode.kerberos.principal.pattern</name>
>     <value>*</value>
>   </property>
> <property>
>   <name>oozie.launcher.mapreduce.job.hdfs-servers</name>
>   <value>hdfs://oozie.test1.com:8020,hdfs://remote.test2.com:8020</value>
> </property>
>  
>                 <property>
>                     <name>oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude</name>
>                     <value>remote.test2.com</value>
>                 </property>
> </configuration>
>               <arg>hdfs://remote.test2.com:8020/tmp/1</arg>
>               <arg>hdfs://oozie.test1.com:8020/tmp/2</arg>
>         </distcp>
>         <ok to="End"/>
>         <error to="Kill"/>
>     </action>
>     <end name="End"/>
> </workflow-app>
> ```
> 
> Prior to executing the workflow I had to setup cross realm trust between the test secure clusters. It involved:
> - changing Kerberos configuration ``/etc/krb5.conf`` (adding realms and setting additional properties like ``udp_preference_limit = 1``)
> - regenerating service credentials
> - changing HDFS settings (such as ``dfs.namenode.kerberos.principal.pattern``) and setting hadoop auth to local rule like ``RULE:[2:$1](.*)s/(.*)/$1/g``
> - additional configuration to enable trust between the test hadoop clusters
> 
> 
> Thanks,
> 
> Attila Sasvari
> 
>


Re: Review Request 63875: OOZIE-2900 Retrieve tokens for oozie.launcher.mapreduce.job.hdfs-servers before submission

Posted by Peter Bacsko via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63875/#review191831
-----------------------------------------------------------


Ship it!




Ship It!

- Peter Bacsko


On nov. 24, 2017, 10:29 de, Attila Sasvari wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63875/
> -----------------------------------------------------------
> 
> (Updated nov. 24, 2017, 10:29 de)
> 
> 
> Review request for oozie, Peter Bacsko and Robert Kanter.
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> Before Oozie on YARN, ``JobSubmitter`` from MapReduce (more precisely ``TokenCache.obtainTokensForNamenodes``) took care of obtaining delegation tokens for HDFS nodes specified by ``oozie.launcher.mapreduce.job.hdfs-servers`` before submitting the Oozie launcher job.
> 
> Oozie launcher is now a Yarn Application Master. It needs HDFS delegation tokens to be able to copy files between secure clusters via the Oozie DistCp action. 
> 
> Changes:
> - ``JavaActionExecutor`` was modified to handle Distcp related parameters like (``oozie.launcher.mapreduce.job.hdfs-servers`` and ``oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude``)
> - ``HDFSCredentials`` was changed to reuse ``TokenCache.obtainTokensForNamenodes`` to obtain HDFS delegation tokens.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/action/hadoop/DistcpActionExecutor.java 81e28f722d9ecd0bf972bf2d0a684d207547d165 
>   core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java 92a7ebe9a7876b6400d80356d5c826e77575e2ab 
>   core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java a1df304914b73d406e986409a8053c2a48e1bd38 
> 
> 
> Diff: https://reviews.apache.org/r/63875/diff/6/
> 
> 
> Testing
> -------
> 
> Tested on a secure cluster that Oozie dist cp action can copy file from another secure cluster where different Kerberos realm was used.
> 
> - workflow:
> ```
> <workflow-app name="DistCp" xmlns="uri:oozie:workflow:0.5">
>     <start to="distcp-3a1f"/>
>     <kill name="Kill">
>         <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
>     </kill>
>     <action name="distcp-3a1f">
>         <distcp xmlns="uri:oozie:distcp-action:0.1">
>             <job-tracker>${jobTracker}</job-tracker>
>             <name-node>${nameNode}</name-node>
> 
> <configuration>
>   <property>
>     <name>oozie.launcher.mapreduce.job.dfs.namenode.kerberos.principal.pattern</name>
>     <value>*</value>
>   </property>
> <property>
>   <name>oozie.launcher.mapreduce.job.hdfs-servers</name>
>   <value>hdfs://oozie.test1.com:8020,hdfs://remote.test2.com:8020</value>
> </property>
>  
>                 <property>
>                     <name>oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude</name>
>                     <value>remote.test2.com</value>
>                 </property>
> </configuration>
>               <arg>hdfs://remote.test2.com:8020/tmp/1</arg>
>               <arg>hdfs://oozie.test1.com:8020/tmp/2</arg>
>         </distcp>
>         <ok to="End"/>
>         <error to="Kill"/>
>     </action>
>     <end name="End"/>
> </workflow-app>
> ```
> 
> Prior to executing the workflow I had to setup cross realm trust between the test secure clusters. It involved:
> - changing Kerberos configuration ``/etc/krb5.conf`` (adding realms and setting additional properties like ``udp_preference_limit = 1``)
> - regenerating service credentials
> - changing HDFS settings (such as ``dfs.namenode.kerberos.principal.pattern``) and setting hadoop auth to local rule like ``RULE:[2:$1](.*)s/(.*)/$1/g``
> - additional configuration to enable trust between the test hadoop clusters
> 
> 
> Thanks,
> 
> Attila Sasvari
> 
>


Re: Review Request 63875: OOZIE-2900 Retrieve tokens for oozie.launcher.mapreduce.job.hdfs-servers before submission

Posted by Peter Bacsko via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63875/#review191830
-----------------------------------------------------------


Ship it!




Ship It!

- Peter Bacsko


On nov. 24, 2017, 10:29 de, Attila Sasvari wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63875/
> -----------------------------------------------------------
> 
> (Updated nov. 24, 2017, 10:29 de)
> 
> 
> Review request for oozie, Peter Bacsko and Robert Kanter.
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> Before Oozie on YARN, ``JobSubmitter`` from MapReduce (more precisely ``TokenCache.obtainTokensForNamenodes``) took care of obtaining delegation tokens for HDFS nodes specified by ``oozie.launcher.mapreduce.job.hdfs-servers`` before submitting the Oozie launcher job.
> 
> Oozie launcher is now a Yarn Application Master. It needs HDFS delegation tokens to be able to copy files between secure clusters via the Oozie DistCp action. 
> 
> Changes:
> - ``JavaActionExecutor`` was modified to handle Distcp related parameters like (``oozie.launcher.mapreduce.job.hdfs-servers`` and ``oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude``)
> - ``HDFSCredentials`` was changed to reuse ``TokenCache.obtainTokensForNamenodes`` to obtain HDFS delegation tokens.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/action/hadoop/DistcpActionExecutor.java 81e28f722d9ecd0bf972bf2d0a684d207547d165 
>   core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java 92a7ebe9a7876b6400d80356d5c826e77575e2ab 
>   core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java a1df304914b73d406e986409a8053c2a48e1bd38 
> 
> 
> Diff: https://reviews.apache.org/r/63875/diff/6/
> 
> 
> Testing
> -------
> 
> Tested on a secure cluster that Oozie dist cp action can copy file from another secure cluster where different Kerberos realm was used.
> 
> - workflow:
> ```
> <workflow-app name="DistCp" xmlns="uri:oozie:workflow:0.5">
>     <start to="distcp-3a1f"/>
>     <kill name="Kill">
>         <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
>     </kill>
>     <action name="distcp-3a1f">
>         <distcp xmlns="uri:oozie:distcp-action:0.1">
>             <job-tracker>${jobTracker}</job-tracker>
>             <name-node>${nameNode}</name-node>
> 
> <configuration>
>   <property>
>     <name>oozie.launcher.mapreduce.job.dfs.namenode.kerberos.principal.pattern</name>
>     <value>*</value>
>   </property>
> <property>
>   <name>oozie.launcher.mapreduce.job.hdfs-servers</name>
>   <value>hdfs://oozie.test1.com:8020,hdfs://remote.test2.com:8020</value>
> </property>
>  
>                 <property>
>                     <name>oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude</name>
>                     <value>remote.test2.com</value>
>                 </property>
> </configuration>
>               <arg>hdfs://remote.test2.com:8020/tmp/1</arg>
>               <arg>hdfs://oozie.test1.com:8020/tmp/2</arg>
>         </distcp>
>         <ok to="End"/>
>         <error to="Kill"/>
>     </action>
>     <end name="End"/>
> </workflow-app>
> ```
> 
> Prior to executing the workflow I had to setup cross realm trust between the test secure clusters. It involved:
> - changing Kerberos configuration ``/etc/krb5.conf`` (adding realms and setting additional properties like ``udp_preference_limit = 1``)
> - regenerating service credentials
> - changing HDFS settings (such as ``dfs.namenode.kerberos.principal.pattern``) and setting hadoop auth to local rule like ``RULE:[2:$1](.*)s/(.*)/$1/g``
> - additional configuration to enable trust between the test hadoop clusters
> 
> 
> Thanks,
> 
> Attila Sasvari
> 
>


Re: Review Request 63875: OOZIE-2900 Retrieve tokens for oozie.launcher.mapreduce.job.hdfs-servers before submission

Posted by Attila Sasvari via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63875/
-----------------------------------------------------------

(Updated Nov. 24, 2017, 10:29 a.m.)


Review request for oozie, Peter Bacsko and Robert Kanter.


Repository: oozie-git


Description
-------

Before Oozie on YARN, ``JobSubmitter`` from MapReduce (more precisely ``TokenCache.obtainTokensForNamenodes``) took care of obtaining delegation tokens for HDFS nodes specified by ``oozie.launcher.mapreduce.job.hdfs-servers`` before submitting the Oozie launcher job.

Oozie launcher is now a Yarn Application Master. It needs HDFS delegation tokens to be able to copy files between secure clusters via the Oozie DistCp action. 

Changes:
- ``JavaActionExecutor`` was modified to handle Distcp related parameters like (``oozie.launcher.mapreduce.job.hdfs-servers`` and ``oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude``)
- ``HDFSCredentials`` was changed to reuse ``TokenCache.obtainTokensForNamenodes`` to obtain HDFS delegation tokens.


Diffs (updated)
-----

  core/src/main/java/org/apache/oozie/action/hadoop/DistcpActionExecutor.java 81e28f722d9ecd0bf972bf2d0a684d207547d165 
  core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java 92a7ebe9a7876b6400d80356d5c826e77575e2ab 
  core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java a1df304914b73d406e986409a8053c2a48e1bd38 


Diff: https://reviews.apache.org/r/63875/diff/6/

Changes: https://reviews.apache.org/r/63875/diff/5-6/


Testing
-------

Tested on a secure cluster that Oozie dist cp action can copy file from another secure cluster where different Kerberos realm was used.

- workflow:
```
<workflow-app name="DistCp" xmlns="uri:oozie:workflow:0.5">
    <start to="distcp-3a1f"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="distcp-3a1f">
        <distcp xmlns="uri:oozie:distcp-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>

<configuration>
  <property>
    <name>oozie.launcher.mapreduce.job.dfs.namenode.kerberos.principal.pattern</name>
    <value>*</value>
  </property>
<property>
  <name>oozie.launcher.mapreduce.job.hdfs-servers</name>
  <value>hdfs://oozie.test1.com:8020,hdfs://remote.test2.com:8020</value>
</property>
 
                <property>
                    <name>oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude</name>
                    <value>remote.test2.com</value>
                </property>
</configuration>
              <arg>hdfs://remote.test2.com:8020/tmp/1</arg>
              <arg>hdfs://oozie.test1.com:8020/tmp/2</arg>
        </distcp>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>
```

Prior to executing the workflow I had to setup cross realm trust between the test secure clusters. It involved:
- changing Kerberos configuration ``/etc/krb5.conf`` (adding realms and setting additional properties like ``udp_preference_limit = 1``)
- regenerating service credentials
- changing HDFS settings (such as ``dfs.namenode.kerberos.principal.pattern``) and setting hadoop auth to local rule like ``RULE:[2:$1](.*)s/(.*)/$1/g``
- additional configuration to enable trust between the test hadoop clusters


Thanks,

Attila Sasvari


Re: Review Request 63875: OOZIE-2900 Retrieve tokens for oozie.launcher.mapreduce.job.hdfs-servers before submission

Posted by Attila Sasvari via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63875/
-----------------------------------------------------------

(Updated Nov. 24, 2017, 10:17 a.m.)


Review request for oozie, Peter Bacsko and Robert Kanter.


Repository: oozie-git


Description
-------

Before Oozie on YARN, ``JobSubmitter`` from MapReduce (more precisely ``TokenCache.obtainTokensForNamenodes``) took care of obtaining delegation tokens for HDFS nodes specified by ``oozie.launcher.mapreduce.job.hdfs-servers`` before submitting the Oozie launcher job.

Oozie launcher is now a Yarn Application Master. It needs HDFS delegation tokens to be able to copy files between secure clusters via the Oozie DistCp action. 

Changes:
- ``JavaActionExecutor`` was modified to handle Distcp related parameters like (``oozie.launcher.mapreduce.job.hdfs-servers`` and ``oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude``)
- ``HDFSCredentials`` was changed to reuse ``TokenCache.obtainTokensForNamenodes`` to obtain HDFS delegation tokens.


Diffs (updated)
-----

  core/src/main/java/org/apache/oozie/action/hadoop/DistcpActionExecutor.java 81e28f722d9ecd0bf972bf2d0a684d207547d165 
  core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java 92a7ebe9a7876b6400d80356d5c826e77575e2ab 
  core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java a1df304914b73d406e986409a8053c2a48e1bd38 


Diff: https://reviews.apache.org/r/63875/diff/5/

Changes: https://reviews.apache.org/r/63875/diff/4-5/


Testing
-------

Tested on a secure cluster that Oozie dist cp action can copy file from another secure cluster where different Kerberos realm was used.

- workflow:
```
<workflow-app name="DistCp" xmlns="uri:oozie:workflow:0.5">
    <start to="distcp-3a1f"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="distcp-3a1f">
        <distcp xmlns="uri:oozie:distcp-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>

<configuration>
  <property>
    <name>oozie.launcher.mapreduce.job.dfs.namenode.kerberos.principal.pattern</name>
    <value>*</value>
  </property>
<property>
  <name>oozie.launcher.mapreduce.job.hdfs-servers</name>
  <value>hdfs://oozie.test1.com:8020,hdfs://remote.test2.com:8020</value>
</property>
 
                <property>
                    <name>oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude</name>
                    <value>remote.test2.com</value>
                </property>
</configuration>
              <arg>hdfs://remote.test2.com:8020/tmp/1</arg>
              <arg>hdfs://oozie.test1.com:8020/tmp/2</arg>
        </distcp>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>
```

Prior to executing the workflow I had to setup cross realm trust between the test secure clusters. It involved:
- changing Kerberos configuration ``/etc/krb5.conf`` (adding realms and setting additional properties like ``udp_preference_limit = 1``)
- regenerating service credentials
- changing HDFS settings (such as ``dfs.namenode.kerberos.principal.pattern``) and setting hadoop auth to local rule like ``RULE:[2:$1](.*)s/(.*)/$1/g``
- additional configuration to enable trust between the test hadoop clusters


Thanks,

Attila Sasvari


Re: Review Request 63875: OOZIE-2900 Retrieve tokens for oozie.launcher.mapreduce.job.hdfs-servers before submission

Posted by Peter Bacsko via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63875/#review191802
-----------------------------------------------------------




core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java
Lines 1405 (patched)
<https://reviews.apache.org/r/63875/#comment269714>

    Minor:
    1. Pls add javadoc to this method, explaining how and when subclasses should override it
    2. Add a "// nop" comment to the method body (indicates that it's empty on purpose)


- Peter Bacsko


On nov. 22, 2017, 3:11 du, Attila Sasvari wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63875/
> -----------------------------------------------------------
> 
> (Updated nov. 22, 2017, 3:11 du)
> 
> 
> Review request for oozie, Peter Bacsko and Robert Kanter.
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> Before Oozie on YARN, ``JobSubmitter`` from MapReduce (more precisely ``TokenCache.obtainTokensForNamenodes``) took care of obtaining delegation tokens for HDFS nodes specified by ``oozie.launcher.mapreduce.job.hdfs-servers`` before submitting the Oozie launcher job.
> 
> Oozie launcher is now a Yarn Application Master. It needs HDFS delegation tokens to be able to copy files between secure clusters via the Oozie DistCp action. 
> 
> Changes:
> - ``JavaActionExecutor`` was modified to handle Distcp related parameters like (``oozie.launcher.mapreduce.job.hdfs-servers`` and ``oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude``)
> - ``HDFSCredentials`` was changed to reuse ``TokenCache.obtainTokensForNamenodes`` to obtain HDFS delegation tokens.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/action/hadoop/DistcpActionExecutor.java 81e28f722d9ecd0bf972bf2d0a684d207547d165 
>   core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java 92a7ebe9a7876b6400d80356d5c826e77575e2ab 
>   core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java a1df304914b73d406e986409a8053c2a48e1bd38 
> 
> 
> Diff: https://reviews.apache.org/r/63875/diff/4/
> 
> 
> Testing
> -------
> 
> Tested on a secure cluster that Oozie dist cp action can copy file from another secure cluster where different Kerberos realm was used.
> 
> - workflow:
> ```
> <workflow-app name="DistCp" xmlns="uri:oozie:workflow:0.5">
>     <start to="distcp-3a1f"/>
>     <kill name="Kill">
>         <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
>     </kill>
>     <action name="distcp-3a1f">
>         <distcp xmlns="uri:oozie:distcp-action:0.1">
>             <job-tracker>${jobTracker}</job-tracker>
>             <name-node>${nameNode}</name-node>
> 
> <configuration>
>   <property>
>     <name>oozie.launcher.mapreduce.job.dfs.namenode.kerberos.principal.pattern</name>
>     <value>*</value>
>   </property>
> <property>
>   <name>oozie.launcher.mapreduce.job.hdfs-servers</name>
>   <value>hdfs://oozie.test1.com:8020,hdfs://remote.test2.com:8020</value>
> </property>
>  
>                 <property>
>                     <name>oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude</name>
>                     <value>remote.test2.com</value>
>                 </property>
> </configuration>
>               <arg>hdfs://remote.test2.com:8020/tmp/1</arg>
>               <arg>hdfs://oozie.test1.com:8020/tmp/2</arg>
>         </distcp>
>         <ok to="End"/>
>         <error to="Kill"/>
>     </action>
>     <end name="End"/>
> </workflow-app>
> ```
> 
> Prior to executing the workflow I had to setup cross realm trust between the test secure clusters. It involved:
> - changing Kerberos configuration ``/etc/krb5.conf`` (adding realms and setting additional properties like ``udp_preference_limit = 1``)
> - regenerating service credentials
> - changing HDFS settings (such as ``dfs.namenode.kerberos.principal.pattern``) and setting hadoop auth to local rule like ``RULE:[2:$1](.*)s/(.*)/$1/g``
> - additional configuration to enable trust between the test hadoop clusters
> 
> 
> Thanks,
> 
> Attila Sasvari
> 
>


Re: Review Request 63875: OOZIE-2900 Retrieve tokens for oozie.launcher.mapreduce.job.hdfs-servers before submission

Posted by András Piros via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63875/#review191804
-----------------------------------------------------------




core/src/main/java/org/apache/oozie/action/hadoop/DistcpActionExecutor.java
Lines 39-42 (patched)
<https://reviews.apache.org/r/63875/#comment269716>

    It would be nice to have field level Javadoc here explaining why those are needed. Also linking to Hadoop repo for similar properties would be nice.



core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java
Lines 59 (patched)
<https://reviews.apache.org/r/63875/#comment269718>

    Would have the `INFO` log inside the delegate method.



core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java
Lines 98-111 (patched)
<https://reviews.apache.org/r/63875/#comment269717>

    An `INFO` level log message stating which tokens are obtained from where, similar to the other method, would be nice.



core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java
Lines 1375-1377 (patched)
<https://reviews.apache.org/r/63875/#comment269719>

    Some `DEBUG` level logging...


- András Piros


On Nov. 22, 2017, 3:11 p.m., Attila Sasvari wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63875/
> -----------------------------------------------------------
> 
> (Updated Nov. 22, 2017, 3:11 p.m.)
> 
> 
> Review request for oozie, Peter Bacsko and Robert Kanter.
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> Before Oozie on YARN, ``JobSubmitter`` from MapReduce (more precisely ``TokenCache.obtainTokensForNamenodes``) took care of obtaining delegation tokens for HDFS nodes specified by ``oozie.launcher.mapreduce.job.hdfs-servers`` before submitting the Oozie launcher job.
> 
> Oozie launcher is now a Yarn Application Master. It needs HDFS delegation tokens to be able to copy files between secure clusters via the Oozie DistCp action. 
> 
> Changes:
> - ``JavaActionExecutor`` was modified to handle Distcp related parameters like (``oozie.launcher.mapreduce.job.hdfs-servers`` and ``oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude``)
> - ``HDFSCredentials`` was changed to reuse ``TokenCache.obtainTokensForNamenodes`` to obtain HDFS delegation tokens.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/action/hadoop/DistcpActionExecutor.java 81e28f722d9ecd0bf972bf2d0a684d207547d165 
>   core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java 92a7ebe9a7876b6400d80356d5c826e77575e2ab 
>   core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java a1df304914b73d406e986409a8053c2a48e1bd38 
> 
> 
> Diff: https://reviews.apache.org/r/63875/diff/4/
> 
> 
> Testing
> -------
> 
> Tested on a secure cluster that Oozie dist cp action can copy file from another secure cluster where different Kerberos realm was used.
> 
> - workflow:
> ```
> <workflow-app name="DistCp" xmlns="uri:oozie:workflow:0.5">
>     <start to="distcp-3a1f"/>
>     <kill name="Kill">
>         <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
>     </kill>
>     <action name="distcp-3a1f">
>         <distcp xmlns="uri:oozie:distcp-action:0.1">
>             <job-tracker>${jobTracker}</job-tracker>
>             <name-node>${nameNode}</name-node>
> 
> <configuration>
>   <property>
>     <name>oozie.launcher.mapreduce.job.dfs.namenode.kerberos.principal.pattern</name>
>     <value>*</value>
>   </property>
> <property>
>   <name>oozie.launcher.mapreduce.job.hdfs-servers</name>
>   <value>hdfs://oozie.test1.com:8020,hdfs://remote.test2.com:8020</value>
> </property>
>  
>                 <property>
>                     <name>oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude</name>
>                     <value>remote.test2.com</value>
>                 </property>
> </configuration>
>               <arg>hdfs://remote.test2.com:8020/tmp/1</arg>
>               <arg>hdfs://oozie.test1.com:8020/tmp/2</arg>
>         </distcp>
>         <ok to="End"/>
>         <error to="Kill"/>
>     </action>
>     <end name="End"/>
> </workflow-app>
> ```
> 
> Prior to executing the workflow I had to setup cross realm trust between the test secure clusters. It involved:
> - changing Kerberos configuration ``/etc/krb5.conf`` (adding realms and setting additional properties like ``udp_preference_limit = 1``)
> - regenerating service credentials
> - changing HDFS settings (such as ``dfs.namenode.kerberos.principal.pattern``) and setting hadoop auth to local rule like ``RULE:[2:$1](.*)s/(.*)/$1/g``
> - additional configuration to enable trust between the test hadoop clusters
> 
> 
> Thanks,
> 
> Attila Sasvari
> 
>


Re: Review Request 63875: OOZIE-2900 Retrieve tokens for oozie.launcher.mapreduce.job.hdfs-servers before submission

Posted by Attila Sasvari via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63875/
-----------------------------------------------------------

(Updated Nov. 22, 2017, 3:11 p.m.)


Review request for oozie, Peter Bacsko and Robert Kanter.


Changes
-------

minor refactoring: moving distcp specific settings required for obtaining HDFS tokens to DistCpActionExecutor


Repository: oozie-git


Description
-------

Before Oozie on YARN, ``JobSubmitter`` from MapReduce (more precisely ``TokenCache.obtainTokensForNamenodes``) took care of obtaining delegation tokens for HDFS nodes specified by ``oozie.launcher.mapreduce.job.hdfs-servers`` before submitting the Oozie launcher job.

Oozie launcher is now a Yarn Application Master. It needs HDFS delegation tokens to be able to copy files between secure clusters via the Oozie DistCp action. 

Changes:
- ``JavaActionExecutor`` was modified to handle Distcp related parameters like (``oozie.launcher.mapreduce.job.hdfs-servers`` and ``oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude``)
- ``HDFSCredentials`` was changed to reuse ``TokenCache.obtainTokensForNamenodes`` to obtain HDFS delegation tokens.


Diffs (updated)
-----

  core/src/main/java/org/apache/oozie/action/hadoop/DistcpActionExecutor.java 81e28f722d9ecd0bf972bf2d0a684d207547d165 
  core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java 92a7ebe9a7876b6400d80356d5c826e77575e2ab 
  core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java a1df304914b73d406e986409a8053c2a48e1bd38 


Diff: https://reviews.apache.org/r/63875/diff/4/

Changes: https://reviews.apache.org/r/63875/diff/3-4/


Testing
-------

Tested on a secure cluster that Oozie dist cp action can copy file from another secure cluster where different Kerberos realm was used.

- workflow:
```
<workflow-app name="DistCp" xmlns="uri:oozie:workflow:0.5">
    <start to="distcp-3a1f"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="distcp-3a1f">
        <distcp xmlns="uri:oozie:distcp-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>

<configuration>
  <property>
    <name>oozie.launcher.mapreduce.job.dfs.namenode.kerberos.principal.pattern</name>
    <value>*</value>
  </property>
<property>
  <name>oozie.launcher.mapreduce.job.hdfs-servers</name>
  <value>hdfs://oozie.test1.com:8020,hdfs://remote.test2.com:8020</value>
</property>
 
                <property>
                    <name>oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude</name>
                    <value>remote.test2.com</value>
                </property>
</configuration>
              <arg>hdfs://remote.test2.com:8020/tmp/1</arg>
              <arg>hdfs://oozie.test1.com:8020/tmp/2</arg>
        </distcp>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>
```

Prior to executing the workflow I had to setup cross realm trust between the test secure clusters. It involved:
- changing Kerberos configuration ``/etc/krb5.conf`` (adding realms and setting additional properties like ``udp_preference_limit = 1``)
- regenerating service credentials
- changing HDFS settings (such as ``dfs.namenode.kerberos.principal.pattern``) and setting hadoop auth to local rule like ``RULE:[2:$1](.*)s/(.*)/$1/g``
- additional configuration to enable trust between the test hadoop clusters


Thanks,

Attila Sasvari


Re: Review Request 63875: OOZIE-2900 Retrieve tokens for oozie.launcher.mapreduce.job.hdfs-servers before submission

Posted by Attila Sasvari via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63875/
-----------------------------------------------------------

(Updated Nov. 21, 2017, 3:49 p.m.)


Review request for oozie, Peter Bacsko and Robert Kanter.


Repository: oozie-git


Description
-------

Before Oozie on YARN, ``JobSubmitter`` from MapReduce (more precisely ``TokenCache.obtainTokensForNamenodes``) took care of obtaining delegation tokens for HDFS nodes specified by ``oozie.launcher.mapreduce.job.hdfs-servers`` before submitting the Oozie launcher job.

Oozie launcher is now a Yarn Application Master. It needs HDFS delegation tokens to be able to copy files between secure clusters via the Oozie DistCp action. 

Changes:
- ``JavaActionExecutor`` was modified to handle Distcp related parameters like (``oozie.launcher.mapreduce.job.hdfs-servers`` and ``oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude``)
- ``HDFSCredentials`` was changed to reuse ``TokenCache.obtainTokensForNamenodes`` to obtain HDFS delegation tokens.


Diffs (updated)
-----

  core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java 92a7ebe9a7876b6400d80356d5c826e77575e2ab 
  core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java a1df304914b73d406e986409a8053c2a48e1bd38 


Diff: https://reviews.apache.org/r/63875/diff/3/

Changes: https://reviews.apache.org/r/63875/diff/2-3/


Testing
-------

Tested on a secure cluster that Oozie dist cp action can copy file from another secure cluster where different Kerberos realm was used.

- workflow:
```
<workflow-app name="DistCp" xmlns="uri:oozie:workflow:0.5">
    <start to="distcp-3a1f"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="distcp-3a1f">
        <distcp xmlns="uri:oozie:distcp-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>

<configuration>
  <property>
    <name>oozie.launcher.mapreduce.job.dfs.namenode.kerberos.principal.pattern</name>
    <value>*</value>
  </property>
<property>
  <name>oozie.launcher.mapreduce.job.hdfs-servers</name>
  <value>hdfs://oozie.test1.com:8020,hdfs://remote.test2.com:8020</value>
</property>
 
                <property>
                    <name>oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude</name>
                    <value>remote.test2.com</value>
                </property>
</configuration>
              <arg>hdfs://remote.test2.com:8020/tmp/1</arg>
              <arg>hdfs://oozie.test1.com:8020/tmp/2</arg>
        </distcp>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>
```

Prior to executing the workflow I had to setup cross realm trust between the test secure clusters. It involved:
- changing Kerberos configuration ``/etc/krb5.conf`` (adding realms and setting additional properties like ``udp_preference_limit = 1``)
- regenerating service credentials
- changing HDFS settings (such as ``dfs.namenode.kerberos.principal.pattern``) and setting hadoop auth to local rule like ``RULE:[2:$1](.*)s/(.*)/$1/g``
- additional configuration to enable trust between the test hadoop clusters


Thanks,

Attila Sasvari


Re: Review Request 63875: OOZIE-2900 Retrieve tokens for oozie.launcher.mapreduce.job.hdfs-servers before submission

Posted by Attila Sasvari via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63875/
-----------------------------------------------------------

(Updated Nov. 20, 2017, 11:21 p.m.)


Review request for oozie, Peter Bacsko and Robert Kanter.


Repository: oozie-git


Description
-------

Before Oozie on YARN, ``JobSubmitter`` from MapReduce (more precisely ``TokenCache.obtainTokensForNamenodes``) took care of obtaining delegation tokens for HDFS nodes specified by ``oozie.launcher.mapreduce.job.hdfs-servers`` before submitting the Oozie launcher job.

Oozie launcher is now a Yarn Application Master. It needs HDFS delegation tokens to be able to copy files between secure clusters via the Oozie DistCp action. 

Changes:
- ``JavaActionExecutor`` was modified to handle Distcp related parameters like (``oozie.launcher.mapreduce.job.hdfs-servers`` and ``oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude``)
- ``HDFSCredentials`` was changed to reuse ``TokenCache.obtainTokensForNamenodes`` to obtain HDFS delegation tokens.


Diffs (updated)
-----

  core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java 92a7ebe9a7876b6400d80356d5c826e77575e2ab 
  core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java a1df304914b73d406e986409a8053c2a48e1bd38 


Diff: https://reviews.apache.org/r/63875/diff/2/

Changes: https://reviews.apache.org/r/63875/diff/1-2/


Testing
-------

Tested on a secure cluster that Oozie dist cp action can copy file from another secure cluster where different Kerberos realm was used.

- workflow:
```
<workflow-app name="DistCp" xmlns="uri:oozie:workflow:0.5">
    <start to="distcp-3a1f"/>
    <kill name="Kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <action name="distcp-3a1f">
        <distcp xmlns="uri:oozie:distcp-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>

<configuration>
  <property>
    <name>oozie.launcher.mapreduce.job.dfs.namenode.kerberos.principal.pattern</name>
    <value>*</value>
  </property>
<property>
  <name>oozie.launcher.mapreduce.job.hdfs-servers</name>
  <value>hdfs://oozie.test1.com:8020,hdfs://remote.test2.com:8020</value>
</property>
 
                <property>
                    <name>oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude</name>
                    <value>remote.test2.com</value>
                </property>
</configuration>
              <arg>hdfs://remote.test2.com:8020/tmp/1</arg>
              <arg>hdfs://oozie.test1.com:8020/tmp/2</arg>
        </distcp>
        <ok to="End"/>
        <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>
```

Prior to executing the workflow I had to setup cross realm trust between the test secure clusters. It involved:
- changing Kerberos configuration ``/etc/krb5.conf`` (adding realms and setting additional properties like ``udp_preference_limit = 1``)
- regenerating service credentials
- changing HDFS settings (such as ``dfs.namenode.kerberos.principal.pattern``) and setting hadoop auth to local rule like ``RULE:[2:$1](.*)s/(.*)/$1/g``
- additional configuration to enable trust between the test hadoop clusters


Thanks,

Attila Sasvari


Re: Review Request 63875: OOZIE-2900 Retrieve tokens for oozie.launcher.mapreduce.job.hdfs-servers before submission

Posted by Peter Bacsko via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63875/#review191343
-----------------------------------------------------------




core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java
Lines 68 (patched)
<https://reviews.apache.org/r/63875/#comment269125>

    You can simplify this a bit:
    
    String[] nameNodes = conf.getStrings(MRJobConfig.JOB_NAMENODES);


- Peter Bacsko


On nov. 16, 2017, 1:50 du, Attila Sasvari wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63875/
> -----------------------------------------------------------
> 
> (Updated nov. 16, 2017, 1:50 du)
> 
> 
> Review request for oozie, Peter Bacsko and Robert Kanter.
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> Before Oozie on YARN, ``JobSubmitter`` from MapReduce (more precisely ``TokenCache.obtainTokensForNamenodes``) took care of obtaining delegation tokens for HDFS nodes specified by ``oozie.launcher.mapreduce.job.hdfs-servers`` before submitting the Oozie launcher job.
> 
> Oozie launcher is now a Yarn Application Master. It needs HDFS delegation tokens to be able to copy files between secure clusters via the Oozie DistCp action. 
> 
> Changes:
> - ``JavaActionExecutor`` was modified to handle Distcp related parameters like (``oozie.launcher.mapreduce.job.hdfs-servers`` and ``oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude``)
> - ``HDFSCredentials`` was changed to reuse ``TokenCache.obtainTokensForNamenodes`` to obtain HDFS delegation tokens.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java 92a7ebe 
>   core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java 8cb76cf 
> 
> 
> Diff: https://reviews.apache.org/r/63875/diff/1/
> 
> 
> Testing
> -------
> 
> Tested on a secure cluster that Oozie dist cp action can copy file from another secure cluster where different Kerberos realm was used.
> 
> - workflow:
> ```
> <workflow-app name="DistCp" xmlns="uri:oozie:workflow:0.5">
>     <start to="distcp-3a1f"/>
>     <kill name="Kill">
>         <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
>     </kill>
>     <action name="distcp-3a1f">
>         <distcp xmlns="uri:oozie:distcp-action:0.1">
>             <job-tracker>${jobTracker}</job-tracker>
>             <name-node>${nameNode}</name-node>
> 
> <configuration>
>   <property>
>     <name>oozie.launcher.mapreduce.job.dfs.namenode.kerberos.principal.pattern</name>
>     <value>*</value>
>   </property>
> <property>
>   <name>oozie.launcher.mapreduce.job.hdfs-servers</name>
>   <value>hdfs://oozie.test1.com:8020,hdfs://remote.test2.com:8020</value>
> </property>
>  
>                 <property>
>                     <name>oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude</name>
>                     <value>remote.test2.com</value>
>                 </property>
> </configuration>
>               <arg>hdfs://remote.test2.com:8020/tmp/1</arg>
>               <arg>hdfs://oozie.test1.com:8020/tmp/2</arg>
>         </distcp>
>         <ok to="End"/>
>         <error to="Kill"/>
>     </action>
>     <end name="End"/>
> </workflow-app>
> ```
> 
> Prior to executing the workflow I had to setup cross realm trust between the test secure clusters. It involved:
> - changing Kerberos configuration ``/etc/krb5.conf`` (adding realms and setting additional properties like ``udp_preference_limit = 1``)
> - regenerating service credentials
> - changing HDFS settings (such as ``dfs.namenode.kerberos.principal.pattern``) and setting hadoop auth to local rule like ``RULE:[2:$1](.*)s/(.*)/$1/g``
> - additional configuration to enable trust between the test hadoop clusters
> 
> 
> Thanks,
> 
> Attila Sasvari
> 
>


Re: Review Request 63875: OOZIE-2900 Retrieve tokens for oozie.launcher.mapreduce.job.hdfs-servers before submission

Posted by Peter Cseh via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63875/#review191332
-----------------------------------------------------------




core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java
Lines 63-64 (patched)
<https://reviews.apache.org/r/63875/#comment269115>

    You could use UserGroupInformationService here.


- Peter Cseh


On Nov. 16, 2017, 1:50 p.m., Attila Sasvari wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63875/
> -----------------------------------------------------------
> 
> (Updated Nov. 16, 2017, 1:50 p.m.)
> 
> 
> Review request for oozie, Peter Bacsko and Robert Kanter.
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> Before Oozie on YARN, ``JobSubmitter`` from MapReduce (more precisely ``TokenCache.obtainTokensForNamenodes``) took care of obtaining delegation tokens for HDFS nodes specified by ``oozie.launcher.mapreduce.job.hdfs-servers`` before submitting the Oozie launcher job.
> 
> Oozie launcher is now a Yarn Application Master. It needs HDFS delegation tokens to be able to copy files between secure clusters via the Oozie DistCp action. 
> 
> Changes:
> - ``JavaActionExecutor`` was modified to handle Distcp related parameters like (``oozie.launcher.mapreduce.job.hdfs-servers`` and ``oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude``)
> - ``HDFSCredentials`` was changed to reuse ``TokenCache.obtainTokensForNamenodes`` to obtain HDFS delegation tokens.
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/action/hadoop/HDFSCredentials.java 92a7ebe 
>   core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java 8cb76cf 
> 
> 
> Diff: https://reviews.apache.org/r/63875/diff/1/
> 
> 
> Testing
> -------
> 
> Tested on a secure cluster that Oozie dist cp action can copy file from another secure cluster where different Kerberos realm was used.
> 
> - workflow:
> ```
> <workflow-app name="DistCp" xmlns="uri:oozie:workflow:0.5">
>     <start to="distcp-3a1f"/>
>     <kill name="Kill">
>         <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
>     </kill>
>     <action name="distcp-3a1f">
>         <distcp xmlns="uri:oozie:distcp-action:0.1">
>             <job-tracker>${jobTracker}</job-tracker>
>             <name-node>${nameNode}</name-node>
> 
> <configuration>
>   <property>
>     <name>oozie.launcher.mapreduce.job.dfs.namenode.kerberos.principal.pattern</name>
>     <value>*</value>
>   </property>
> <property>
>   <name>oozie.launcher.mapreduce.job.hdfs-servers</name>
>   <value>hdfs://oozie.test1.com:8020,hdfs://remote.test2.com:8020</value>
> </property>
>  
>                 <property>
>                     <name>oozie.launcher.mapreduce.job.hdfs-servers.token-renewal.exclude</name>
>                     <value>remote.test2.com</value>
>                 </property>
> </configuration>
>               <arg>hdfs://remote.test2.com:8020/tmp/1</arg>
>               <arg>hdfs://oozie.test1.com:8020/tmp/2</arg>
>         </distcp>
>         <ok to="End"/>
>         <error to="Kill"/>
>     </action>
>     <end name="End"/>
> </workflow-app>
> ```
> 
> Prior to executing the workflow I had to setup cross realm trust between the test secure clusters. It involved:
> - changing Kerberos configuration ``/etc/krb5.conf`` (adding realms and setting additional properties like ``udp_preference_limit = 1``)
> - regenerating service credentials
> - changing HDFS settings (such as ``dfs.namenode.kerberos.principal.pattern``) and setting hadoop auth to local rule like ``RULE:[2:$1](.*)s/(.*)/$1/g``
> - additional configuration to enable trust between the test hadoop clusters
> 
> 
> Thanks,
> 
> Attila Sasvari
> 
>