You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by aljoscha <gi...@git.apache.org> on 2018/07/20 08:03:15 UTC

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

GitHub user aljoscha opened a pull request:

    https://github.com/apache/flink/pull/6377

    [FLINK-8981] Add end-to-end test for running on YARN with Kerberos

    This adds a complete Docker container setup and Docker Compose file for
    starting a kerberized Hadoop cluster on Docker.
    
    The test script does the following:
     * package "build-target" Flink dist into a tarball
     * build docker container
     * start cluster using docker compose
     * upload tarball and unpack
     * modify flink-conf.yaml to use Kerberos keytab for hadoop-user
     * Run Streaming WordCount Job
     * verify results
    
    We set an exit trap before to ensure that we shut down the docker
    compose cluster at the end.
    
    As a prerequisite, this also fixes how we resolve directories in the end-to-end scripts.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aljoscha/flink jira-8981-kerberos-end-to-end-test

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/6377.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6377
    
----
commit 5aec051a76089f623ebc21418ec5751f9fcad780
Author: Aljoscha Krettek <al...@...>
Date:   2018-07-18T09:51:27Z

    [hotfix] Resolve symbolic links in test scripts

commit 634426b096a36147c3180f9c732efef51155e5bb
Author: Aljoscha Krettek <al...@...>
Date:   2018-07-18T11:46:29Z

    [FLINK-8981] Add end-to-end test for running on YARN with Kerberos
    
    This adds a complete Docker container setup and Docker Compose file for
    starting a kerberized Hadoop cluster on Docker.
    
    The test script does the following:
     * package "build-target" Flink dist into a tarball
     * build docker container
     * start cluster using docker compose
     * upload tarball and unpack
     * modify flink-conf.yaml to use Kerberos keytab for hadoop-user
     * Run Streaming WordCount Job
     * verify results
    
    We set an exit trap before to ensure that we shut down the docker
    compose cluster at the end.

----


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r203988838
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/bootstrap.sh ---
    @@ -0,0 +1,121 @@
    +#!/bin/bash
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +
    +: ${HADOOP_PREFIX:=/usr/local/hadoop}
    +
    +$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
    +
    +rm /tmp/*.pid
    +
    +# installing libraries if any - (resource urls added comma separated to the ACP system variable)
    +cd $HADOOP_PREFIX/share/hadoop/common ; for cp in ${ACP//,/ }; do  echo == $cp; curl -LO $cp ; done; cd -
    +
    +# kerberos client
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" /etc/krb5.conf
    --- End diff --
    
    `EXAMPLE.COM` is pretty buch the placeholder for this and could be replaced with a different realm in `bootstrap.sh`. But the default is just to still use `EXAMPLE.COM`. I could rename this `TEMPLATE.URL` if you want. 😅 


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r203997764
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/bootstrap.sh ---
    @@ -0,0 +1,121 @@
    +#!/bin/bash
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +
    +: ${HADOOP_PREFIX:=/usr/local/hadoop}
    +
    +$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
    +
    +rm /tmp/*.pid
    +
    +# installing libraries if any - (resource urls added comma separated to the ACP system variable)
    +cd $HADOOP_PREFIX/share/hadoop/common ; for cp in ${ACP//,/ }; do  echo == $cp; curl -LO $cp ; done; cd -
    +
    +# kerberos client
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" /etc/krb5.conf
    --- End diff --
    
    yeah nvm, I doubt introducing a placeholder really fixes things :/


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204305537
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
    @@ -0,0 +1,159 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +#
    +# This image is modified version of Knappek/docker-hadoop-secure
    +#   * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +#
    +# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    +#   * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +#
    +# Author: Aljoscha Krettek
    +# Date:   2018 May, 15
    +#
    +# Creates multi-node, kerberized Hadoop cluster on Docker
    +
    +FROM sequenceiq/pam:ubuntu-14.04
    +MAINTAINER aljoscha
    +
    +USER root
    +
    +RUN addgroup hadoop
    +RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
    +RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
    +RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
    +
    +RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
    +
    +# install dev tools
    +RUN apt-get update
    +RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync unzip
    +
    +# Kerberos client
    +RUN apt-get install krb5-user -y
    +RUN mkdir -p /var/log/kerberos
    +RUN touch /var/log/kerberos/kadmind.log
    +
    +# passwordless ssh
    +RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key /root/.ssh/id_rsa
    +RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
    +RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
    +
    +# java
    +RUN mkdir -p /usr/java/default && \
    +     curl -Ls 'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz' -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
    +     tar --strip-components=1 -xz -C /usr/java/default/
    +
    +ENV JAVA_HOME /usr/java/default
    +ENV PATH $PATH:$JAVA_HOME/bin
    +
    +RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
    +RUN unzip jce_policy-8.zip
    +RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar /UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
    +
    +ENV HADOOP_VERSION=2.8.4
    --- End diff --
    
    I added a config option to the Dockerfile


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204330950
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml ---
    @@ -0,0 +1,87 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +version: '3.5'
    +
    +networks:
    +  docker-hadoop-cluster-network:
    +    driver: bridge
    +    name: docker-hadoop-cluster-network
    +
    +services:
    +  kdc:
    +    container_name: "kdc"
    +    hostname: kdc.kerberos.com
    +    image: sequenceiq/kerberos
    +    networks:
    +      - docker-hadoop-cluster-network
    +    environment:
    +      REALM: EXAMPLE.COM
    +      DOMAIN_REALM: kdc.kerberos.com
    +
    +  master:
    +    image: ${DOCKER_HADOOP_IMAGE_NAME:-flink/docker-hadoop-secure-cluster:latest}
    +    command: master
    +    depends_on:
    +      - kdc
    +    ports:
    +      - "50070:50070"
    --- End diff --
    
    This was because the setup was meant to be accessible for more generic use and access from outside. I'm removing it.


---

[GitHub] flink issue #6377: [FLINK-8981] Add end-to-end test for running on YARN with...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the issue:

    https://github.com/apache/flink/pull/6377
  
    @zentol & @dawidwys I think I addressed all of your comments


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by dawidwys <gi...@git.apache.org>.
Github user dawidwys commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204019505
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
    @@ -0,0 +1,159 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +#
    +# This image is modified version of Knappek/docker-hadoop-secure
    +#   * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +#
    +# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    +#   * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +#
    +# Author: Aljoscha Krettek
    +# Date:   2018 May, 15
    +#
    +# Creates multi-node, kerberized Hadoop cluster on Docker
    +
    +FROM sequenceiq/pam:ubuntu-14.04
    +MAINTAINER aljoscha
    +
    +USER root
    --- End diff --
    
    I think commands in Dockerfile are by default executed as root. So this command is unnecessary.


---

[GitHub] flink issue #6377: [FLINK-8981] Add end-to-end test for running on YARN with...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the issue:

    https://github.com/apache/flink/pull/6377
  
    @zentol I addressed most of your comments. I now added a test in there that verifies the job fails if we don't set a keytab. I'm not running with different Hadoop. It might work but I'm basically setting up a hadoop cluster in docker and I don't know if this is similar enough (or exactly the same, for my purposes) between the versions.
    
    @dawidwys Thanks for the thorough comments, I'll go through them next!


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r203989263
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
    @@ -0,0 +1,159 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +#
    +# This image is modified version of Knappek/docker-hadoop-secure
    +#   * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +#
    +# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    +#   * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +#
    +# Author: Aljoscha Krettek
    +# Date:   2018 May, 15
    +#
    +# Creates multi-node, kerberized Hadoop cluster on Docker
    +
    +FROM sequenceiq/pam:ubuntu-14.04
    +MAINTAINER aljoscha
    +
    +USER root
    +
    +RUN addgroup hadoop
    +RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
    +RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
    +RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
    +
    +RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
    +
    +# install dev tools
    +RUN apt-get update
    +RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync unzip
    +
    +# Kerberos client
    +RUN apt-get install krb5-user -y
    +RUN mkdir -p /var/log/kerberos
    +RUN touch /var/log/kerberos/kadmind.log
    +
    +# passwordless ssh
    +RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key /root/.ssh/id_rsa
    +RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
    +RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
    +
    +# java
    +RUN mkdir -p /usr/java/default && \
    +     curl -Ls 'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz' -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
    +     tar --strip-components=1 -xz -C /usr/java/default/
    +
    +ENV JAVA_HOME /usr/java/default
    +ENV PATH $PATH:$JAVA_HOME/bin
    +
    +RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
    +RUN unzip jce_policy-8.zip
    +RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar /UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
    +
    +ENV HADOOP_VERSION=2.8.4
    --- End diff --
    
    I think the solution in the long run should be to never ship Flink with a Hadoop version, i.e. make the hadoop-free version the default. 


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204307793
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
    @@ -0,0 +1,159 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +#
    +# This image is modified version of Knappek/docker-hadoop-secure
    +#   * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +#
    +# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    +#   * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +#
    +# Author: Aljoscha Krettek
    +# Date:   2018 May, 15
    +#
    +# Creates multi-node, kerberized Hadoop cluster on Docker
    +
    +FROM sequenceiq/pam:ubuntu-14.04
    +MAINTAINER aljoscha
    +
    +USER root
    +
    +RUN addgroup hadoop
    +RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
    --- End diff --
    
    will do


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204314197
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
    @@ -0,0 +1,159 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +#
    +# This image is modified version of Knappek/docker-hadoop-secure
    +#   * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +#
    +# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    +#   * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +#
    +# Author: Aljoscha Krettek
    +# Date:   2018 May, 15
    +#
    +# Creates multi-node, kerberized Hadoop cluster on Docker
    +
    +FROM sequenceiq/pam:ubuntu-14.04
    +MAINTAINER aljoscha
    +
    +USER root
    +
    +RUN addgroup hadoop
    +RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
    +RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
    +RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
    +
    +RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
    +
    +# install dev tools
    +RUN apt-get update
    +RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync unzip
    +
    +# Kerberos client
    +RUN apt-get install krb5-user -y
    +RUN mkdir -p /var/log/kerberos
    +RUN touch /var/log/kerberos/kadmind.log
    +
    +# passwordless ssh
    +RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key /root/.ssh/id_rsa
    +RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
    +RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
    +
    +# java
    +RUN mkdir -p /usr/java/default && \
    +     curl -Ls 'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz' -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
    +     tar --strip-components=1 -xz -C /usr/java/default/
    +
    +ENV JAVA_HOME /usr/java/default
    +ENV PATH $PATH:$JAVA_HOME/bin
    +
    +RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
    +RUN unzip jce_policy-8.zip
    +RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar /UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
    +
    +ENV HADOOP_VERSION=2.8.4
    +
    +# ENV HADOOP_URL https://www.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
    +ENV HADOOP_URL http://archive.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
    +RUN set -x \
    +    && curl -fSL "$HADOOP_URL" -o /tmp/hadoop.tar.gz \
    +    && tar -xf /tmp/hadoop.tar.gz -C /usr/local/ \
    +    && rm /tmp/hadoop.tar.gz*
    +
    +WORKDIR /usr/local
    +RUN ln -s /usr/local/hadoop-${HADOOP_VERSION} /usr/local/hadoop
    +RUN chown root:root -R /usr/local/hadoop-${HADOOP_VERSION}/
    +RUN chown root:root -R /usr/local/hadoop/
    +RUN chown root:yarn /usr/local/hadoop/bin/container-executor
    +RUN chmod 6050 /usr/local/hadoop/bin/container-executor
    +RUN mkdir -p /hadoop-data/nm-local-dirs
    +RUN mkdir -p /hadoop-data/nm-log-dirs
    +RUN chown yarn:yarn /hadoop-data
    +RUN chown yarn:yarn /hadoop-data/nm-local-dirs
    +RUN chown yarn:yarn /hadoop-data/nm-log-dirs
    +RUN chmod 755 /hadoop-data
    +RUN chmod 755 /hadoop-data/nm-local-dirs
    +RUN chmod 755 /hadoop-data/nm-log-dirs
    +
    +
    +ENV HADOOP_HOME /usr/local/hadoop
    +ENV HADOOP_COMMON_HOME /usr/local/hadoop
    +ENV HADOOP_HDFS_HOME /usr/local/hadoop
    +ENV HADOOP_MAPRED_HOME /usr/local/hadoop
    +ENV HADOOP_YARN_HOME /usr/local/hadoop
    +ENV HADOOP_CONF_DIR /usr/local/hadoop/etc/hadoop
    +ENV YARN_CONF_DIR /usr/local/hadoop/etc/hadoop
    +ENV HADOOP_LOG_DIR /var/log/hadoop
    +ENV HADOOP_BIN_HOME $HADOOP_HOME/bin
    +ENV PATH $PATH:$HADOOP_BIN_HOME
    +
    +ENV KRB_REALM EXAMPLE.COM
    +ENV DOMAIN_REALM example.com
    +ENV KERBEROS_ADMIN admin/admin
    +ENV KERBEROS_ADMIN_PASSWORD admin
    +ENV KEYTAB_DIR /etc/security/keytabs
    +
    +RUN mkdir /var/log/hadoop
    +
    +ADD config/core-site.xml $HADOOP_HOME/etc/hadoop/core-site.xml
    +ADD config/hdfs-site.xml $HADOOP_HOME/etc/hadoop/hdfs-site.xml
    +ADD config/mapred-site.xml $HADOOP_HOME/etc/hadoop/mapred-site.xml
    +ADD config/yarn-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml
    +ADD config/container-executor.cfg $HADOOP_HOME/etc/hadoop/container-executor.cfg
    +RUN chmod 400 $HADOOP_HOME/etc/hadoop/container-executor.cfg
    +RUN chown root:yarn $HADOOP_HOME/etc/hadoop/container-executor.cfg
    +# ADD config/log4j.properties $HADOOP_HOME/etc/hadoop/log4j.properties
    --- End diff --
    
    removing


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by dawidwys <gi...@git.apache.org>.
Github user dawidwys commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204017611
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
    @@ -0,0 +1,159 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +#
    +# This image is modified version of Knappek/docker-hadoop-secure
    +#   * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +#
    +# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    +#   * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +#
    +# Author: Aljoscha Krettek
    +# Date:   2018 May, 15
    +#
    +# Creates multi-node, kerberized Hadoop cluster on Docker
    +
    +FROM sequenceiq/pam:ubuntu-14.04
    +MAINTAINER aljoscha
    +
    +USER root
    +
    +RUN addgroup hadoop
    +RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
    +RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
    +RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
    +
    +RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
    +
    +# install dev tools
    +RUN apt-get update
    +RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync unzip
    +
    +# Kerberos client
    +RUN apt-get install krb5-user -y
    +RUN mkdir -p /var/log/kerberos
    +RUN touch /var/log/kerberos/kadmind.log
    +
    +# passwordless ssh
    +RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key /root/.ssh/id_rsa
    +RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
    +RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
    +
    +# java
    +RUN mkdir -p /usr/java/default && \
    --- End diff --
    
    Can't we use java image as the base image?


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by dawidwys <gi...@git.apache.org>.
Github user dawidwys commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r203981196
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
    @@ -0,0 +1,159 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +#
    +# This image is modified version of Knappek/docker-hadoop-secure
    +#   * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +#
    +# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    +#   * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +#
    +# Author: Aljoscha Krettek
    +# Date:   2018 May, 15
    +#
    +# Creates multi-node, kerberized Hadoop cluster on Docker
    +
    +FROM sequenceiq/pam:ubuntu-14.04
    +MAINTAINER aljoscha
    +
    +USER root
    +
    +RUN addgroup hadoop
    +RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
    --- End diff --
    
    Could we merge such blocks in a single command? It will create less layers which should decrease both building time and size of the image.


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r203972230
  
    --- Diff: flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh ---
    @@ -0,0 +1,104 @@
    +#!/usr/bin/env bash
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +set -o pipefail
    +
    +source "$(dirname "$0")"/common.sh
    +
    +FLINK_TARBALL_DIR=$TEST_DATA_DIR
    +FLINK_TARBALL=flink.tar.gz
    +FLINK_DIRNAME=$(basename $FLINK_DIR)
    +
    +echo "Flink Tarball directory $FLINK_TARBALL_DIR"
    +echo "Flink tarball filename $FLINK_TARBALL"
    +echo "Flink distribution directory name $FLINK_DIRNAME"
    +echo "End-to-end directory $END_TO_END_DIR"
    +docker --version
    +docker-compose --version
    +
    +mkdir -p $FLINK_TARBALL_DIR
    +tar czf $FLINK_TARBALL_DIR/$FLINK_TARBALL -C $(dirname $FLINK_DIR) .
    +
    +echo "Building Hadoop Docker container"
    +until docker build -f $END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/Dockerfile -t flink/docker-hadoop-secure-cluster:latest $END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/; do
    +    # with all the downloading and ubuntu updating a lot of flakiness can happen, make sure
    +    # we don't immediately fail
    +    echo "Something went wrong while building the Docker image, retrying ..."
    +    sleep 2
    +done
    +
    +echo "Starting Hadoop cluster"
    +docker-compose -f $END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml up -d
    +
    +# make sure we stop our cluster at the end
    +function cluster_shutdown {
    +  # don't call ourselves again for another signal interruption
    +  trap "exit -1" INT
    +  # don't call ourselves again for normal exit
    +  trap "" EXIT
    +
    +  docker-compose -f $END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml down
    +  rm $FLINK_TARBALL_DIR/$FLINK_TARBALL
    +}
    +trap cluster_shutdown INT
    +trap cluster_shutdown EXIT
    +
    +until docker cp $FLINK_TARBALL_DIR/$FLINK_TARBALL master:/home/hadoop-user/; do
    +    # we're retrying this one because we don't know yet if the container is ready
    +    echo "Uploading Flink tarball to docker master failed, retrying ..."
    +    sleep 5
    +done
    +
    +# now, at least the container is ready
    +docker exec -it master bash -c "tar xzf /home/hadoop-user/$FLINK_TARBALL --directory /home/hadoop-user/"
    +
    +docker exec -it master bash -c "echo \"security.kerberos.login.keytab: /home/hadoop-user/hadoop-user.keytab\" >> /home/hadoop-user/$FLINK_DIRNAME/conf/flink-conf.yaml"
    +docker exec -it master bash -c "echo \"security.kerberos.login.principal: hadoop-user\" >> /home/hadoop-user/$FLINK_DIRNAME/conf/flink-conf.yaml"
    +
    +echo "Flink config:"
    +docker exec -it master bash -c "cat /home/hadoop-user/$FLINK_DIRNAME/conf/flink-conf.yaml"
    +
    +# make the output path random, just in case it already exists, for example if we
    +# had cached docker containers
    +OUTPUT_PATH=hdfs:///user/hadoop-user/wc-out-$RANDOM
    +
    +# it's important to run this with higher parallelism, otherwise we might risk that
    +# JM and TM are on the same YARN node and that we therefore don't test the keytab shipping
    +until docker exec -it master bash -c "export HADOOP_CLASSPATH=\`hadoop classpath\` && /home/hadoop-user/$FLINK_DIRNAME/bin/flink run -m yarn-cluster -yn 3 -ys 1 -ytm 1200 -yjm 800 -p 3 /home/hadoop-user/$FLINK_DIRNAME/examples/streaming/WordCount.jar --output $OUTPUT_PATH"; do
    +    echo "Running the Flink job failed, might be that the cluster is not ready yet, retrying ..."
    --- End diff --
    
    is there no way to check whether the cluster is ready? The logs contain several submission failures due to this :/


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r203973291
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
    @@ -0,0 +1,159 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +#
    +# This image is modified version of Knappek/docker-hadoop-secure
    +#   * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +#
    +# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    +#   * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +#
    +# Author: Aljoscha Krettek
    +# Date:   2018 May, 15
    +#
    +# Creates multi-node, kerberized Hadoop cluster on Docker
    +
    +FROM sequenceiq/pam:ubuntu-14.04
    +MAINTAINER aljoscha
    +
    +USER root
    +
    +RUN addgroup hadoop
    +RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
    +RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
    +RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
    +
    +RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
    +
    +# install dev tools
    +RUN apt-get update
    +RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync unzip
    +
    +# Kerberos client
    +RUN apt-get install krb5-user -y
    +RUN mkdir -p /var/log/kerberos
    +RUN touch /var/log/kerberos/kadmind.log
    +
    +# passwordless ssh
    +RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key /root/.ssh/id_rsa
    +RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
    +RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
    +
    +# java
    +RUN mkdir -p /usr/java/default && \
    +     curl -Ls 'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz' -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
    +     tar --strip-components=1 -xz -C /usr/java/default/
    +
    +ENV JAVA_HOME /usr/java/default
    +ENV PATH $PATH:$JAVA_HOME/bin
    +
    +RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
    +RUN unzip jce_policy-8.zip
    +RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar /UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
    +
    +ENV HADOOP_VERSION=2.8.4
    --- End diff --
    
    This potentially uses a different hadoop version than the one against flink-dist was built against.


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204329978
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml ---
    @@ -0,0 +1,87 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +version: '3.5'
    +
    +networks:
    +  docker-hadoop-cluster-network:
    +    driver: bridge
    +    name: docker-hadoop-cluster-network
    +
    +services:
    +  kdc:
    +    container_name: "kdc"
    +    hostname: kdc.kerberos.com
    +    image: sequenceiq/kerberos
    +    networks:
    +      - docker-hadoop-cluster-network
    +    environment:
    +      REALM: EXAMPLE.COM
    +      DOMAIN_REALM: kdc.kerberos.com
    +
    +  master:
    +    image: ${DOCKER_HADOOP_IMAGE_NAME:-flink/docker-hadoop-secure-cluster:latest}
    +    command: master
    +    depends_on:
    +      - kdc
    +    ports:
    +      - "50070:50070"
    +      - "50470:50470"
    +      - "8088:8088"
    +      - "19888:19888"
    +      - "8188:8188"
    +    container_name: "master"
    +    hostname: master.docker-hadoop-cluster-network
    +    networks:
    +      - docker-hadoop-cluster-network
    +    environment:
    +      KRB_REALM: EXAMPLE.COM
    +      DOMAIN_REALM: kdc.kerberos.com
    +
    +  slave1:
    --- End diff --
    
    I tried this at the very beginning but this doesn't work because the slaves need well formed hostnames for the Kerberos setup to work (it's tricky with the Kerberos principal names). That's why I did it like this. I also don't like it 


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by dawidwys <gi...@git.apache.org>.
Github user dawidwys commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204017355
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
    @@ -0,0 +1,159 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +#
    +# This image is modified version of Knappek/docker-hadoop-secure
    +#   * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +#
    +# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    +#   * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +#
    +# Author: Aljoscha Krettek
    +# Date:   2018 May, 15
    +#
    +# Creates multi-node, kerberized Hadoop cluster on Docker
    +
    +FROM sequenceiq/pam:ubuntu-14.04
    +MAINTAINER aljoscha
    +
    +USER root
    +
    +RUN addgroup hadoop
    +RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
    +RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
    +RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
    +
    +RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
    +
    +# install dev tools
    +RUN apt-get update
    --- End diff --
    
    This is a Dockerfile anti-pattern that leads to some cacheing issues:
    https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#run


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204315760
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
    @@ -0,0 +1,159 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +#
    +# This image is modified version of Knappek/docker-hadoop-secure
    +#   * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +#
    +# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    +#   * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +#
    +# Author: Aljoscha Krettek
    +# Date:   2018 May, 15
    +#
    +# Creates multi-node, kerberized Hadoop cluster on Docker
    +
    +FROM sequenceiq/pam:ubuntu-14.04
    +MAINTAINER aljoscha
    +
    +USER root
    --- End diff --
    
    removing


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by dawidwys <gi...@git.apache.org>.
Github user dawidwys commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204020995
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/bootstrap.sh ---
    @@ -0,0 +1,121 @@
    +#!/bin/bash
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +
    +: ${HADOOP_PREFIX:=/usr/local/hadoop}
    +
    +$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
    +
    +rm /tmp/*.pid
    +
    +# installing libraries if any - (resource urls added comma separated to the ACP system variable)
    +cd $HADOOP_PREFIX/share/hadoop/common ; for cp in ${ACP//,/ }; do  echo == $cp; curl -LO $cp ; done; cd -
    +
    +# kerberos client
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" /etc/krb5.conf
    +sed -i "s/example.com/${DOMAIN_REALM}/g" /etc/krb5.conf
    +
    +# update config files
    +sed -i "s/HOSTNAME/$(hostname -f)/g" $HADOOP_PREFIX/etc/hadoop/core-site.xml
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" $HADOOP_PREFIX/etc/hadoop/core-site.xml
    +sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" $HADOOP_PREFIX/etc/hadoop/core-site.xml
    +
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" $HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
    +sed -i "s/HOSTNAME/$(hostname -f)/g" $HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
    +sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" $HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
    +
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" $HADOOP_PREFIX/etc/hadoop/yarn-site.xml
    +sed -i "s/HOSTNAME/$(hostname -f)/g" $HADOOP_PREFIX/etc/hadoop/yarn-site.xml
    +sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" $HADOOP_PREFIX/etc/hadoop/yarn-site.xml
    +
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" $HADOOP_PREFIX/etc/hadoop/mapred-site.xml
    +sed -i "s/HOSTNAME/$(hostname -f)/g" $HADOOP_PREFIX/etc/hadoop/mapred-site.xml
    +sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" $HADOOP_PREFIX/etc/hadoop/mapred-site.xml
    +
    +sed -i "s#/usr/local/hadoop/bin/container-executor#${NM_CONTAINER_EXECUTOR_PATH}#g" $HADOOP_PREFIX/etc/hadoop/yarn-site.xml
    +
    +# create namenode kerberos principal and keytab
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc -randkey hdfs/$(hostname -f)@${KRB_REALM}"
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc -randkey mapred/$(hostname -f)@${KRB_REALM}"
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc -randkey yarn/$(hostname -f)@${KRB_REALM}"
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc -randkey HTTP/$(hostname -f)@${KRB_REALM}"
    +
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k hdfs.keytab hdfs/$(hostname -f) HTTP/$(hostname -f)"
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k mapred.keytab mapred/$(hostname -f) HTTP/$(hostname -f)"
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k yarn.keytab yarn/$(hostname -f) HTTP/$(hostname -f)"
    +
    +mkdir -p ${KEYTAB_DIR}
    +mv hdfs.keytab ${KEYTAB_DIR}
    +mv mapred.keytab ${KEYTAB_DIR}
    +mv yarn.keytab ${KEYTAB_DIR}
    +chmod 400 ${KEYTAB_DIR}/hdfs.keytab
    +chmod 400 ${KEYTAB_DIR}/mapred.keytab
    +chmod 400 ${KEYTAB_DIR}/yarn.keytab
    +chown hdfs:hadoop ${KEYTAB_DIR}/hdfs.keytab
    +chown mapred:hadoop ${KEYTAB_DIR}/mapred.keytab
    +chown yarn:hadoop ${KEYTAB_DIR}/yarn.keytab
    +
    +service ssh start
    --- End diff --
    
    Can we just make ssh start automatically in Dockerfile?


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204307777
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml ---
    @@ -0,0 +1,87 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +version: '3.5'
    +
    +networks:
    +  docker-hadoop-cluster-network:
    --- End diff --
    
    apparently we don't need it, removing


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204322221
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/bootstrap.sh ---
    @@ -0,0 +1,121 @@
    +#!/bin/bash
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +
    +: ${HADOOP_PREFIX:=/usr/local/hadoop}
    +
    +$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
    +
    +rm /tmp/*.pid
    +
    +# installing libraries if any - (resource urls added comma separated to the ACP system variable)
    +cd $HADOOP_PREFIX/share/hadoop/common ; for cp in ${ACP//,/ }; do  echo == $cp; curl -LO $cp ; done; cd -
    +
    +# kerberos client
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" /etc/krb5.conf
    +sed -i "s/example.com/${DOMAIN_REALM}/g" /etc/krb5.conf
    +
    +# update config files
    +sed -i "s/HOSTNAME/$(hostname -f)/g" $HADOOP_PREFIX/etc/hadoop/core-site.xml
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" $HADOOP_PREFIX/etc/hadoop/core-site.xml
    +sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" $HADOOP_PREFIX/etc/hadoop/core-site.xml
    +
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" $HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
    +sed -i "s/HOSTNAME/$(hostname -f)/g" $HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
    +sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" $HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
    +
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" $HADOOP_PREFIX/etc/hadoop/yarn-site.xml
    +sed -i "s/HOSTNAME/$(hostname -f)/g" $HADOOP_PREFIX/etc/hadoop/yarn-site.xml
    +sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" $HADOOP_PREFIX/etc/hadoop/yarn-site.xml
    +
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" $HADOOP_PREFIX/etc/hadoop/mapred-site.xml
    +sed -i "s/HOSTNAME/$(hostname -f)/g" $HADOOP_PREFIX/etc/hadoop/mapred-site.xml
    +sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" $HADOOP_PREFIX/etc/hadoop/mapred-site.xml
    +
    +sed -i "s#/usr/local/hadoop/bin/container-executor#${NM_CONTAINER_EXECUTOR_PATH}#g" $HADOOP_PREFIX/etc/hadoop/yarn-site.xml
    +
    +# create namenode kerberos principal and keytab
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc -randkey hdfs/$(hostname -f)@${KRB_REALM}"
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc -randkey mapred/$(hostname -f)@${KRB_REALM}"
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc -randkey yarn/$(hostname -f)@${KRB_REALM}"
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc -randkey HTTP/$(hostname -f)@${KRB_REALM}"
    +
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k hdfs.keytab hdfs/$(hostname -f) HTTP/$(hostname -f)"
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k mapred.keytab mapred/$(hostname -f) HTTP/$(hostname -f)"
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k yarn.keytab yarn/$(hostname -f) HTTP/$(hostname -f)"
    +
    +mkdir -p ${KEYTAB_DIR}
    +mv hdfs.keytab ${KEYTAB_DIR}
    +mv mapred.keytab ${KEYTAB_DIR}
    +mv yarn.keytab ${KEYTAB_DIR}
    +chmod 400 ${KEYTAB_DIR}/hdfs.keytab
    +chmod 400 ${KEYTAB_DIR}/mapred.keytab
    +chmod 400 ${KEYTAB_DIR}/yarn.keytab
    +chown hdfs:hadoop ${KEYTAB_DIR}/hdfs.keytab
    +chown mapred:hadoop ${KEYTAB_DIR}/mapred.keytab
    +chown yarn:hadoop ${KEYTAB_DIR}/yarn.keytab
    +
    +service ssh start
    --- End diff --
    
    from a quick search it's not easily possible: https://stackoverflow.com/questions/22886470/start-sshd-automatically-with-docker-container


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204308017
  
    --- Diff: flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh ---
    @@ -0,0 +1,104 @@
    +#!/usr/bin/env bash
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +set -o pipefail
    +
    +source "$(dirname "$0")"/common.sh
    +
    +FLINK_TARBALL_DIR=$TEST_DATA_DIR
    +FLINK_TARBALL=flink.tar.gz
    +FLINK_DIRNAME=$(basename $FLINK_DIR)
    +
    +echo "Flink Tarball directory $FLINK_TARBALL_DIR"
    +echo "Flink tarball filename $FLINK_TARBALL"
    +echo "Flink distribution directory name $FLINK_DIRNAME"
    +echo "End-to-end directory $END_TO_END_DIR"
    +docker --version
    +docker-compose --version
    +
    +mkdir -p $FLINK_TARBALL_DIR
    +tar czf $FLINK_TARBALL_DIR/$FLINK_TARBALL -C $(dirname $FLINK_DIR) .
    +
    +echo "Building Hadoop Docker container"
    +until docker build -f $END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/Dockerfile -t flink/docker-hadoop-secure-cluster:latest $END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/; do
    +    # with all the downloading and ubuntu updating a lot of flakiness can happen, make sure
    +    # we don't immediately fail
    +    echo "Something went wrong while building the Docker image, retrying ..."
    +    sleep 2
    +done
    +
    +echo "Starting Hadoop cluster"
    +docker-compose -f $END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml up -d
    +
    +# make sure we stop our cluster at the end
    +function cluster_shutdown {
    +  # don't call ourselves again for another signal interruption
    +  trap "exit -1" INT
    +  # don't call ourselves again for normal exit
    +  trap "" EXIT
    +
    +  docker-compose -f $END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml down
    +  rm $FLINK_TARBALL_DIR/$FLINK_TARBALL
    +}
    +trap cluster_shutdown INT
    +trap cluster_shutdown EXIT
    +
    +until docker cp $FLINK_TARBALL_DIR/$FLINK_TARBALL master:/home/hadoop-user/; do
    --- End diff --
    
    I did it like this so that rebuilding Flink does not require building the docker image. I know I could do it as one of the last steps but with repeatedly running the test locally I think it's still easier this way. WDYT?


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by dawidwys <gi...@git.apache.org>.
Github user dawidwys commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204327765
  
    --- Diff: flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh ---
    @@ -0,0 +1,104 @@
    +#!/usr/bin/env bash
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +set -o pipefail
    +
    +source "$(dirname "$0")"/common.sh
    +
    +FLINK_TARBALL_DIR=$TEST_DATA_DIR
    +FLINK_TARBALL=flink.tar.gz
    +FLINK_DIRNAME=$(basename $FLINK_DIR)
    +
    +echo "Flink Tarball directory $FLINK_TARBALL_DIR"
    +echo "Flink tarball filename $FLINK_TARBALL"
    +echo "Flink distribution directory name $FLINK_DIRNAME"
    +echo "End-to-end directory $END_TO_END_DIR"
    +docker --version
    +docker-compose --version
    +
    +mkdir -p $FLINK_TARBALL_DIR
    +tar czf $FLINK_TARBALL_DIR/$FLINK_TARBALL -C $(dirname $FLINK_DIR) .
    +
    +echo "Building Hadoop Docker container"
    +until docker build -f $END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/Dockerfile -t flink/docker-hadoop-secure-cluster:latest $END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/; do
    +    # with all the downloading and ubuntu updating a lot of flakiness can happen, make sure
    +    # we don't immediately fail
    +    echo "Something went wrong while building the Docker image, retrying ..."
    +    sleep 2
    +done
    +
    +echo "Starting Hadoop cluster"
    +docker-compose -f $END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml up -d
    +
    +# make sure we stop our cluster at the end
    +function cluster_shutdown {
    +  # don't call ourselves again for another signal interruption
    +  trap "exit -1" INT
    +  # don't call ourselves again for normal exit
    +  trap "" EXIT
    +
    +  docker-compose -f $END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml down
    +  rm $FLINK_TARBALL_DIR/$FLINK_TARBALL
    +}
    +trap cluster_shutdown INT
    +trap cluster_shutdown EXIT
    +
    +until docker cp $FLINK_TARBALL_DIR/$FLINK_TARBALL master:/home/hadoop-user/; do
    --- End diff --
    
    I think if we add it as one of the last steps of the Dockerfile it wouldn't make a difference in build time as all previous layers would be cached anyway. At the same time if we move it to the Dockerfile we will no longer need the loop.


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r203990078
  
    --- Diff: flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh ---
    @@ -0,0 +1,104 @@
    +#!/usr/bin/env bash
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +set -o pipefail
    +
    +source "$(dirname "$0")"/common.sh
    +
    +FLINK_TARBALL_DIR=$TEST_DATA_DIR
    +FLINK_TARBALL=flink.tar.gz
    +FLINK_DIRNAME=$(basename $FLINK_DIR)
    +
    +echo "Flink Tarball directory $FLINK_TARBALL_DIR"
    +echo "Flink tarball filename $FLINK_TARBALL"
    +echo "Flink distribution directory name $FLINK_DIRNAME"
    +echo "End-to-end directory $END_TO_END_DIR"
    +docker --version
    +docker-compose --version
    +
    +mkdir -p $FLINK_TARBALL_DIR
    +tar czf $FLINK_TARBALL_DIR/$FLINK_TARBALL -C $(dirname $FLINK_DIR) .
    +
    +echo "Building Hadoop Docker container"
    +until docker build -f $END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/Dockerfile -t flink/docker-hadoop-secure-cluster:latest $END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/; do
    +    # with all the downloading and ubuntu updating a lot of flakiness can happen, make sure
    +    # we don't immediately fail
    +    echo "Something went wrong while building the Docker image, retrying ..."
    +    sleep 2
    +done
    +
    +echo "Starting Hadoop cluster"
    +docker-compose -f $END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml up -d
    +
    +# make sure we stop our cluster at the end
    +function cluster_shutdown {
    +  # don't call ourselves again for another signal interruption
    +  trap "exit -1" INT
    +  # don't call ourselves again for normal exit
    +  trap "" EXIT
    +
    +  docker-compose -f $END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml down
    +  rm $FLINK_TARBALL_DIR/$FLINK_TARBALL
    +}
    +trap cluster_shutdown INT
    +trap cluster_shutdown EXIT
    +
    +until docker cp $FLINK_TARBALL_DIR/$FLINK_TARBALL master:/home/hadoop-user/; do
    +    # we're retrying this one because we don't know yet if the container is ready
    +    echo "Uploading Flink tarball to docker master failed, retrying ..."
    +    sleep 5
    +done
    +
    +# now, at least the container is ready
    +docker exec -it master bash -c "tar xzf /home/hadoop-user/$FLINK_TARBALL --directory /home/hadoop-user/"
    +
    +docker exec -it master bash -c "echo \"security.kerberos.login.keytab: /home/hadoop-user/hadoop-user.keytab\" >> /home/hadoop-user/$FLINK_DIRNAME/conf/flink-conf.yaml"
    +docker exec -it master bash -c "echo \"security.kerberos.login.principal: hadoop-user\" >> /home/hadoop-user/$FLINK_DIRNAME/conf/flink-conf.yaml"
    +
    +echo "Flink config:"
    +docker exec -it master bash -c "cat /home/hadoop-user/$FLINK_DIRNAME/conf/flink-conf.yaml"
    +
    +# make the output path random, just in case it already exists, for example if we
    +# had cached docker containers
    +OUTPUT_PATH=hdfs:///user/hadoop-user/wc-out-$RANDOM
    +
    +# it's important to run this with higher parallelism, otherwise we might risk that
    +# JM and TM are on the same YARN node and that we therefore don't test the keytab shipping
    +until docker exec -it master bash -c "export HADOOP_CLASSPATH=\`hadoop classpath\` && /home/hadoop-user/$FLINK_DIRNAME/bin/flink run -m yarn-cluster -yn 3 -ys 1 -ytm 1200 -yjm 800 -p 3 /home/hadoop-user/$FLINK_DIRNAME/examples/streaming/WordCount.jar --output $OUTPUT_PATH"; do
    +    echo "Running the Flink job failed, might be that the cluster is not ready yet, retrying ..."
    --- End diff --
    
    I'm afraid not, that's why there are the retries around the stuff that deals with HDFS/YARN.


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by dawidwys <gi...@git.apache.org>.
Github user dawidwys commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r203983391
  
    --- Diff: flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh ---
    @@ -0,0 +1,104 @@
    +#!/usr/bin/env bash
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +set -o pipefail
    +
    +source "$(dirname "$0")"/common.sh
    +
    +FLINK_TARBALL_DIR=$TEST_DATA_DIR
    +FLINK_TARBALL=flink.tar.gz
    +FLINK_DIRNAME=$(basename $FLINK_DIR)
    +
    +echo "Flink Tarball directory $FLINK_TARBALL_DIR"
    +echo "Flink tarball filename $FLINK_TARBALL"
    +echo "Flink distribution directory name $FLINK_DIRNAME"
    +echo "End-to-end directory $END_TO_END_DIR"
    +docker --version
    +docker-compose --version
    +
    +mkdir -p $FLINK_TARBALL_DIR
    +tar czf $FLINK_TARBALL_DIR/$FLINK_TARBALL -C $(dirname $FLINK_DIR) .
    +
    +echo "Building Hadoop Docker container"
    +until docker build -f $END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/Dockerfile -t flink/docker-hadoop-secure-cluster:latest $END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/; do
    +    # with all the downloading and ubuntu updating a lot of flakiness can happen, make sure
    +    # we don't immediately fail
    +    echo "Something went wrong while building the Docker image, retrying ..."
    +    sleep 2
    +done
    +
    +echo "Starting Hadoop cluster"
    +docker-compose -f $END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml up -d
    +
    +# make sure we stop our cluster at the end
    +function cluster_shutdown {
    +  # don't call ourselves again for another signal interruption
    +  trap "exit -1" INT
    +  # don't call ourselves again for normal exit
    +  trap "" EXIT
    +
    +  docker-compose -f $END_TO_END_DIR/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml down
    +  rm $FLINK_TARBALL_DIR/$FLINK_TARBALL
    +}
    +trap cluster_shutdown INT
    +trap cluster_shutdown EXIT
    +
    +until docker cp $FLINK_TARBALL_DIR/$FLINK_TARBALL master:/home/hadoop-user/; do
    --- End diff --
    
    Can't we set it up during image build?


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by dawidwys <gi...@git.apache.org>.
Github user dawidwys commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204017957
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
    @@ -0,0 +1,159 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +#
    +# This image is modified version of Knappek/docker-hadoop-secure
    +#   * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +#
    +# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    +#   * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +#
    +# Author: Aljoscha Krettek
    +# Date:   2018 May, 15
    +#
    +# Creates multi-node, kerberized Hadoop cluster on Docker
    +
    +FROM sequenceiq/pam:ubuntu-14.04
    +MAINTAINER aljoscha
    +
    +USER root
    +
    +RUN addgroup hadoop
    +RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
    +RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
    +RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
    +
    +RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
    +
    +# install dev tools
    +RUN apt-get update
    +RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync unzip
    +
    +# Kerberos client
    +RUN apt-get install krb5-user -y
    +RUN mkdir -p /var/log/kerberos
    +RUN touch /var/log/kerberos/kadmind.log
    +
    +# passwordless ssh
    +RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key /root/.ssh/id_rsa
    +RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
    +RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
    +
    +# java
    +RUN mkdir -p /usr/java/default && \
    +     curl -Ls 'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz' -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
    +     tar --strip-components=1 -xz -C /usr/java/default/
    +
    +ENV JAVA_HOME /usr/java/default
    +ENV PATH $PATH:$JAVA_HOME/bin
    +
    +RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
    +RUN unzip jce_policy-8.zip
    +RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar /UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
    +
    +ENV HADOOP_VERSION=2.8.4
    +
    +# ENV HADOOP_URL https://www.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
    +ENV HADOOP_URL http://archive.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
    +RUN set -x \
    +    && curl -fSL "$HADOOP_URL" -o /tmp/hadoop.tar.gz \
    +    && tar -xf /tmp/hadoop.tar.gz -C /usr/local/ \
    +    && rm /tmp/hadoop.tar.gz*
    +
    +WORKDIR /usr/local
    +RUN ln -s /usr/local/hadoop-${HADOOP_VERSION} /usr/local/hadoop
    +RUN chown root:root -R /usr/local/hadoop-${HADOOP_VERSION}/
    +RUN chown root:root -R /usr/local/hadoop/
    +RUN chown root:yarn /usr/local/hadoop/bin/container-executor
    +RUN chmod 6050 /usr/local/hadoop/bin/container-executor
    +RUN mkdir -p /hadoop-data/nm-local-dirs
    +RUN mkdir -p /hadoop-data/nm-log-dirs
    +RUN chown yarn:yarn /hadoop-data
    +RUN chown yarn:yarn /hadoop-data/nm-local-dirs
    +RUN chown yarn:yarn /hadoop-data/nm-log-dirs
    +RUN chmod 755 /hadoop-data
    +RUN chmod 755 /hadoop-data/nm-local-dirs
    +RUN chmod 755 /hadoop-data/nm-log-dirs
    +
    +
    +ENV HADOOP_HOME /usr/local/hadoop
    +ENV HADOOP_COMMON_HOME /usr/local/hadoop
    +ENV HADOOP_HDFS_HOME /usr/local/hadoop
    +ENV HADOOP_MAPRED_HOME /usr/local/hadoop
    +ENV HADOOP_YARN_HOME /usr/local/hadoop
    +ENV HADOOP_CONF_DIR /usr/local/hadoop/etc/hadoop
    +ENV YARN_CONF_DIR /usr/local/hadoop/etc/hadoop
    +ENV HADOOP_LOG_DIR /var/log/hadoop
    +ENV HADOOP_BIN_HOME $HADOOP_HOME/bin
    +ENV PATH $PATH:$HADOOP_BIN_HOME
    +
    +ENV KRB_REALM EXAMPLE.COM
    +ENV DOMAIN_REALM example.com
    +ENV KERBEROS_ADMIN admin/admin
    +ENV KERBEROS_ADMIN_PASSWORD admin
    +ENV KEYTAB_DIR /etc/security/keytabs
    +
    +RUN mkdir /var/log/hadoop
    +
    +ADD config/core-site.xml $HADOOP_HOME/etc/hadoop/core-site.xml
    +ADD config/hdfs-site.xml $HADOOP_HOME/etc/hadoop/hdfs-site.xml
    +ADD config/mapred-site.xml $HADOOP_HOME/etc/hadoop/mapred-site.xml
    +ADD config/yarn-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml
    +ADD config/container-executor.cfg $HADOOP_HOME/etc/hadoop/container-executor.cfg
    +RUN chmod 400 $HADOOP_HOME/etc/hadoop/container-executor.cfg
    +RUN chown root:yarn $HADOOP_HOME/etc/hadoop/container-executor.cfg
    +# ADD config/log4j.properties $HADOOP_HOME/etc/hadoop/log4j.properties
    --- End diff --
    
    remove?


---

[GitHub] flink issue #6377: [FLINK-8981] Add end-to-end test for running on YARN with...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the issue:

    https://github.com/apache/flink/pull/6377
  
    This PR adds the test to `flink-ci`: https://github.com/zentol/flink-ci/pull/1
    
    This is a run on my on `flink-ci` fork where the test is run five times without issue: https://travis-ci.org/aljoscha/flink-ci/builds/405995875


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204327123
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/README.md ---
    @@ -0,0 +1,118 @@
    +# Apache Hadoop Docker image with Kerberos enabled
    +
    +This image is modified version of Knappek/docker-hadoop-secure
    + * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +
    +With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    + * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +
    +And a lot of added stuff for making this an actual, properly configured, kerberized cluster with proper user/permissions structure.
    +
    +Versions
    +--------
    +
    +* JDK8
    +* Hadoop 2.8.3
    +
    +Default Environment Variables
    +-----------------------------
    +
    +| Name | Value | Description |
    +| ---- | ----  | ---- |
    +| `KRB_REALM` | `EXAMPLE.COM` | The Kerberos Realm, more information [here](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#) |
    +| `DOMAIN_REALM` | `example.com` | The Kerberos Domain Realm, more information [here](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#) |
    +| `KERBEROS_ADMIN` | `admin/admin` | The KDC admin user |
    +| `KERBEROS_ADMIN_PASSWORD` | `admin` | The KDC admin password |
    +
    +You can simply define these variables in the `docker-compose.yml`.
    +
    +Run image
    +---------
    +
    +Clone the [Github project](https://github.com/aljoscha/docker-hadoop-secure-cluster) and run
    +
    +```
    +docker-compose up
    +```
    +
    +Usage
    +-----
    +
    +Get the container name with `docker ps` and login to the container with
    +
    +```
    +docker exec -it <container-name> /bin/bash
    +```
    +
    +
    +To obtain a Kerberos ticket, execute
    +
    +```
    +kinit -kt /home/hadoop-user/hadoop-user.keytab hadoop-user
    +```
    +
    +Afterwards you can use `hdfs` CLI like
    +
    +```
    +hdfs dfs -ls /
    +```
    +
    +
    +Known issues
    +------------
    +
    +### Unable to obtain Kerberos password
    +
    +#### Error
    +docker-compose up fails for the first time with the error
    +
    +```
    +Login failure for nn/hadoop.docker.com@EXAMPLE.COM from keytab /etc/security/keytabs/nn.service.keytab: javax.security.auth.login.LoginException: Unable to obtain password from user
    +```
    +
    +#### Solution
    +
    +Stop the containers with `docker-compose down` and start again with `docker-compose up -d`.
    +
    +
    +### JDK 8
    +
    +Make sure you use download a JDK version that is still available. Old versions can be deprecated by Oracle and thus the download link won't be able anymore.
    +
    +Get the latest JDK8 Download URL with
    +
    +```
    +curl -s https://lv.binarybabel.org/catalog-api/java/jdk8.json
    +```
    +
    +### Java Keystore
    +
    +If the Keystroe has been expired, then create a new `keystore.jks`:
    --- End diff --
    
    fixing the typo but we need the keystore for the SSL setup, which we seem to need for the Kerberos setup



---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by dawidwys <gi...@git.apache.org>.
Github user dawidwys commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204328745
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/README.md ---
    @@ -0,0 +1,118 @@
    +# Apache Hadoop Docker image with Kerberos enabled
    +
    +This image is modified version of Knappek/docker-hadoop-secure
    + * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +
    +With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    + * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +
    +And a lot of added stuff for making this an actual, properly configured, kerberized cluster with proper user/permissions structure.
    +
    +Versions
    +--------
    +
    +* JDK8
    +* Hadoop 2.8.3
    +
    +Default Environment Variables
    +-----------------------------
    +
    +| Name | Value | Description |
    +| ---- | ----  | ---- |
    +| `KRB_REALM` | `EXAMPLE.COM` | The Kerberos Realm, more information [here](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#) |
    +| `DOMAIN_REALM` | `example.com` | The Kerberos Domain Realm, more information [here](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#) |
    +| `KERBEROS_ADMIN` | `admin/admin` | The KDC admin user |
    +| `KERBEROS_ADMIN_PASSWORD` | `admin` | The KDC admin password |
    +
    +You can simply define these variables in the `docker-compose.yml`.
    +
    +Run image
    +---------
    +
    +Clone the [Github project](https://github.com/aljoscha/docker-hadoop-secure-cluster) and run
    +
    +```
    +docker-compose up
    +```
    +
    +Usage
    +-----
    +
    +Get the container name with `docker ps` and login to the container with
    +
    +```
    +docker exec -it <container-name> /bin/bash
    +```
    +
    +
    +To obtain a Kerberos ticket, execute
    +
    +```
    +kinit -kt /home/hadoop-user/hadoop-user.keytab hadoop-user
    +```
    +
    +Afterwards you can use `hdfs` CLI like
    +
    +```
    +hdfs dfs -ls /
    +```
    +
    +
    +Known issues
    +------------
    +
    +### Unable to obtain Kerberos password
    +
    +#### Error
    +docker-compose up fails for the first time with the error
    +
    +```
    +Login failure for nn/hadoop.docker.com@EXAMPLE.COM from keytab /etc/security/keytabs/nn.service.keytab: javax.security.auth.login.LoginException: Unable to obtain password from user
    +```
    +
    +#### Solution
    +
    +Stop the containers with `docker-compose down` and start again with `docker-compose up -d`.
    +
    +
    +### JDK 8
    +
    +Make sure you use download a JDK version that is still available. Old versions can be deprecated by Oracle and thus the download link won't be able anymore.
    +
    +Get the latest JDK8 Download URL with
    +
    +```
    +curl -s https://lv.binarybabel.org/catalog-api/java/jdk8.json
    +```
    +
    +### Java Keystore
    +
    +If the Keystroe has been expired, then create a new `keystore.jks`:
    --- End diff --
    
    Yes, I rather meant if the expiring of the keystore might be a problem. Could we create the keystore in test?
    
    What is the expiry time for the keystore you use? Maybe setting it to some big number will be enough, but I think the default (365 days) might cause some troubles.


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r203974298
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/bootstrap.sh ---
    @@ -0,0 +1,121 @@
    +#!/bin/bash
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +
    +: ${HADOOP_PREFIX:=/usr/local/hadoop}
    +
    +$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
    +
    +rm /tmp/*.pid
    +
    +# installing libraries if any - (resource urls added comma separated to the ACP system variable)
    +cd $HADOOP_PREFIX/share/hadoop/common ; for cp in ${ACP//,/ }; do  echo == $cp; curl -LO $cp ; done; cd -
    +
    +# kerberos client
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" /etc/krb5.conf
    --- End diff --
    
    `EXAMPLE.COM` is used in several places, is there any way we can set this in a single place? (for example with search&replace if necessary)


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r203973314
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
    @@ -0,0 +1,159 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +#
    +# This image is modified version of Knappek/docker-hadoop-secure
    +#   * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +#
    +# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    +#   * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +#
    +# Author: Aljoscha Krettek
    +# Date:   2018 May, 15
    +#
    +# Creates multi-node, kerberized Hadoop cluster on Docker
    +
    +FROM sequenceiq/pam:ubuntu-14.04
    +MAINTAINER aljoscha
    +
    +USER root
    +
    +RUN addgroup hadoop
    +RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
    +RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
    +RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
    +
    +RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
    +
    +# install dev tools
    +RUN apt-get update
    +RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync unzip
    +
    +# Kerberos client
    +RUN apt-get install krb5-user -y
    +RUN mkdir -p /var/log/kerberos
    +RUN touch /var/log/kerberos/kadmind.log
    +
    +# passwordless ssh
    +RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key /root/.ssh/id_rsa
    +RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
    +RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
    +
    +# java
    +RUN mkdir -p /usr/java/default && \
    +     curl -Ls 'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz' -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
    +     tar --strip-components=1 -xz -C /usr/java/default/
    +
    +ENV JAVA_HOME /usr/java/default
    +ENV PATH $PATH:$JAVA_HOME/bin
    +
    +RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
    +RUN unzip jce_policy-8.zip
    +RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar /UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
    +
    +ENV HADOOP_VERSION=2.8.4
    +
    +# ENV HADOOP_URL https://www.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
    --- End diff --
    
    remove


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r203990327
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/README.md ---
    @@ -0,0 +1,118 @@
    +# Apache Hadoop Docker image with Kerberos enabled
    +
    +This image is modified version of Knappek/docker-hadoop-secure
    + * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +
    +With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    + * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +
    +And a lot of added stuff for making this an actual, properly configured, kerberized cluster with proper user/permissions structure.
    +
    +Versions
    +--------
    +
    +* JDK8
    +* Hadoop 2.8.3
    +
    +Default Environment Variables
    +-----------------------------
    +
    +| Name | Value | Description |
    +| ---- | ----  | ---- |
    +| `KRB_REALM` | `EXAMPLE.COM` | The Kerberos Realm, more information [here](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#) |
    +| `DOMAIN_REALM` | `example.com` | The Kerberos Domain Realm, more information [here](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#) |
    +| `KERBEROS_ADMIN` | `admin/admin` | The KDC admin user |
    +| `KERBEROS_ADMIN_PASSWORD` | `admin` | The KDC admin password |
    +
    +You can simply define these variables in the `docker-compose.yml`.
    +
    +Run image
    +---------
    +
    +Clone the [Github project](https://github.com/aljoscha/docker-hadoop-secure-cluster) and run
    --- End diff --
    
    fixing


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r203989967
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/config/log4j.properties ---
    @@ -0,0 +1,354 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +
    +# Define some default values that can be overridden by system properties
    +hadoop.root.logger=INFO,console
    +hadoop.log.dir=.
    +hadoop.log.file=hadoop.log
    +
    +# Define the root logger to the system property "hadoop.root.logger".
    +log4j.rootLogger=${hadoop.root.logger}, EventCounter
    +
    +# Logging Threshold
    +log4j.threshold=ALL
    +
    +# Null Appender
    +log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender
    +
    +#
    +# Rolling File Appender - cap space usage at 5gb.
    +#
    +hadoop.log.maxfilesize=256MB
    +hadoop.log.maxbackupindex=20
    +log4j.appender.RFA=org.apache.log4j.RollingFileAppender
    +log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
    +
    +log4j.appender.RFA.MaxFileSize=${hadoop.log.maxfilesize}
    +log4j.appender.RFA.MaxBackupIndex=${hadoop.log.maxbackupindex}
    +
    +log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
    +
    +# Pattern format: Date LogLevel LoggerName LogMessage
    +log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    +# Debugging Pattern format
    +#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
    +
    +
    +#
    +# Daily Rolling File Appender
    +#
    +
    +log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
    +log4j.appender.DRFA.File=${hadoop.log.dir}/${hadoop.log.file}
    +
    +# Rollover at midnight
    +log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
    +
    +log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
    +
    +# Pattern format: Date LogLevel LoggerName LogMessage
    +log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    +# Debugging Pattern format
    +#log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
    +
    +
    +#
    +# console
    +# Add "console" to rootlogger above if you want to use this
    +#
    +
    +log4j.appender.console=org.apache.log4j.ConsoleAppender
    +log4j.appender.console.target=System.err
    +log4j.appender.console.layout=org.apache.log4j.PatternLayout
    +log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
    +
    +#
    +# TaskLog Appender
    +#
    +
    +#Default values
    +hadoop.tasklog.taskid=null
    +hadoop.tasklog.iscleanup=false
    +hadoop.tasklog.noKeepSplits=4
    +hadoop.tasklog.totalLogFileSize=100
    +hadoop.tasklog.purgeLogSplits=true
    +hadoop.tasklog.logsRetainHours=12
    +
    +log4j.appender.TLA=org.apache.hadoop.mapred.TaskLogAppender
    +log4j.appender.TLA.taskId=${hadoop.tasklog.taskid}
    +log4j.appender.TLA.isCleanup=${hadoop.tasklog.iscleanup}
    +log4j.appender.TLA.totalLogFileSize=${hadoop.tasklog.totalLogFileSize}
    +
    +log4j.appender.TLA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.TLA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    +
    +#
    +# HDFS block state change log from block manager
    +#
    +# Uncomment the following to log normal block state change
    +# messages from BlockManager in NameNode.
    +#log4j.logger.BlockStateChange=DEBUG
    +
    +#
    +#Security appender
    +#
    +hadoop.security.logger=INFO,NullAppender
    +hadoop.security.log.maxfilesize=256MB
    +hadoop.security.log.maxbackupindex=20
    +log4j.category.SecurityLogger=${hadoop.security.logger}
    +hadoop.security.log.file=SecurityAuth-${user.name}.audit
    +log4j.appender.RFAS=org.apache.log4j.RollingFileAppender
    +log4j.appender.RFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
    +log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout
    +log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    +log4j.appender.RFAS.MaxFileSize=${hadoop.security.log.maxfilesize}
    +log4j.appender.RFAS.MaxBackupIndex=${hadoop.security.log.maxbackupindex}
    +
    +#
    +# Daily Rolling Security appender
    +#
    +log4j.appender.DRFAS=org.apache.log4j.DailyRollingFileAppender
    +log4j.appender.DRFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
    +log4j.appender.DRFAS.layout=org.apache.log4j.PatternLayout
    +log4j.appender.DRFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    +log4j.appender.DRFAS.DatePattern=.yyyy-MM-dd
    +
    +#
    +# hadoop configuration logging
    +#
    +
    +# Uncomment the following line to turn off configuration deprecation warnings.
    +# log4j.logger.org.apache.hadoop.conf.Configuration.deprecation=WARN
    +
    +#
    +# hdfs audit logging
    +#
    +hdfs.audit.logger=INFO,NullAppender
    +hdfs.audit.log.maxfilesize=256MB
    +hdfs.audit.log.maxbackupindex=20
    +log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=${hdfs.audit.logger}
    +log4j.additivity.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=false
    +log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender
    +log4j.appender.RFAAUDIT.File=${hadoop.log.dir}/hdfs-audit.log
    +log4j.appender.RFAAUDIT.layout=org.apache.log4j.PatternLayout
    +log4j.appender.RFAAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +log4j.appender.RFAAUDIT.MaxFileSize=${hdfs.audit.log.maxfilesize}
    +log4j.appender.RFAAUDIT.MaxBackupIndex=${hdfs.audit.log.maxbackupindex}
    +
    +#
    +# NameNode metrics logging.
    +# The default is to retain two namenode-metrics.log files up to 64MB each.
    +#
    +namenode.metrics.logger=INFO,NullAppender
    +log4j.logger.NameNodeMetricsLog=${namenode.metrics.logger}
    +log4j.additivity.NameNodeMetricsLog=false
    +log4j.appender.NNMETRICSRFA=org.apache.log4j.RollingFileAppender
    +log4j.appender.NNMETRICSRFA.File=${hadoop.log.dir}/namenode-metrics.log
    +log4j.appender.NNMETRICSRFA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.NNMETRICSRFA.layout.ConversionPattern=%d{ISO8601} %m%n
    +log4j.appender.NNMETRICSRFA.MaxBackupIndex=1
    +log4j.appender.NNMETRICSRFA.MaxFileSize=64MB
    +
    +#
    +# DataNode metrics logging.
    +# The default is to retain two datanode-metrics.log files up to 64MB each.
    +#
    +datanode.metrics.logger=INFO,NullAppender
    +log4j.logger.DataNodeMetricsLog=${datanode.metrics.logger}
    +log4j.additivity.DataNodeMetricsLog=false
    +log4j.appender.DNMETRICSRFA=org.apache.log4j.RollingFileAppender
    +log4j.appender.DNMETRICSRFA.File=${hadoop.log.dir}/datanode-metrics.log
    +log4j.appender.DNMETRICSRFA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.DNMETRICSRFA.layout.ConversionPattern=%d{ISO8601} %m%n
    +log4j.appender.DNMETRICSRFA.MaxBackupIndex=1
    +log4j.appender.DNMETRICSRFA.MaxFileSize=64MB
    +
    +#
    +# mapred audit logging
    +#
    +mapred.audit.logger=INFO,NullAppender
    +mapred.audit.log.maxfilesize=256MB
    +mapred.audit.log.maxbackupindex=20
    +log4j.logger.org.apache.hadoop.mapred.AuditLogger=${mapred.audit.logger}
    +log4j.additivity.org.apache.hadoop.mapred.AuditLogger=false
    +log4j.appender.MRAUDIT=org.apache.log4j.RollingFileAppender
    +log4j.appender.MRAUDIT.File=${hadoop.log.dir}/mapred-audit.log
    +log4j.appender.MRAUDIT.layout=org.apache.log4j.PatternLayout
    +log4j.appender.MRAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +log4j.appender.MRAUDIT.MaxFileSize=${mapred.audit.log.maxfilesize}
    +log4j.appender.MRAUDIT.MaxBackupIndex=${mapred.audit.log.maxbackupindex}
    +
    +# Custom Logging levels
    +
    +#log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG
    +#log4j.logger.org.apache.hadoop.mapred.TaskTracker=DEBUG
    +#log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=DEBUG
    +
    +# Jets3t library
    +log4j.logger.org.jets3t.service.impl.rest.httpclient.RestS3Service=ERROR
    +
    +# AWS SDK & S3A FileSystem
    +log4j.logger.com.amazonaws=ERROR
    +log4j.logger.com.amazonaws.http.AmazonHttpClient=ERROR
    +log4j.logger.org.apache.hadoop.fs.s3a.S3AFileSystem=WARN
    +
    +#
    +# Event Counter Appender
    +# Sends counts of logging messages at different severity levels to Hadoop Metrics.
    +#
    +log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
    +
    +#
    +# Job Summary Appender
    +#
    +# Use following logger to send summary to separate file defined by
    +# hadoop.mapreduce.jobsummary.log.file :
    +# hadoop.mapreduce.jobsummary.logger=INFO,JSA
    +#
    +hadoop.mapreduce.jobsummary.logger=${hadoop.root.logger}
    +hadoop.mapreduce.jobsummary.log.file=hadoop-mapreduce.jobsummary.log
    +hadoop.mapreduce.jobsummary.log.maxfilesize=256MB
    +hadoop.mapreduce.jobsummary.log.maxbackupindex=20
    +log4j.appender.JSA=org.apache.log4j.RollingFileAppender
    +log4j.appender.JSA.File=${hadoop.log.dir}/${hadoop.mapreduce.jobsummary.log.file}
    +log4j.appender.JSA.MaxFileSize=${hadoop.mapreduce.jobsummary.log.maxfilesize}
    +log4j.appender.JSA.MaxBackupIndex=${hadoop.mapreduce.jobsummary.log.maxbackupindex}
    +log4j.appender.JSA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.JSA.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
    +log4j.logger.org.apache.hadoop.mapred.JobInProgress$JobSummary=${hadoop.mapreduce.jobsummary.logger}
    +log4j.additivity.org.apache.hadoop.mapred.JobInProgress$JobSummary=false
    +
    +#
    +# shuffle connection log from shuffleHandler
    +# Uncomment the following line to enable logging of shuffle connections
    +# log4j.logger.org.apache.hadoop.mapred.ShuffleHandler.audit=DEBUG
    +
    +#
    +# Yarn ResourceManager Application Summary Log
    +#
    +# Set the ResourceManager summary log filename
    +yarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log
    +# Set the ResourceManager summary log level and appender
    +yarn.server.resourcemanager.appsummary.logger=${hadoop.root.logger}
    +#yarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY
    +
    +# To enable AppSummaryLogging for the RM,
    +# set yarn.server.resourcemanager.appsummary.logger to
    +# <LEVEL>,RMSUMMARY in hadoop-env.sh
    +
    +# Appender for ResourceManager Application Summary Log
    +# Requires the following properties to be set
    +#    - hadoop.log.dir (Hadoop Log directory)
    +#    - yarn.server.resourcemanager.appsummary.log.file (resource manager app summary log filename)
    +#    - yarn.server.resourcemanager.appsummary.logger (resource manager app summary log level and appender)
    +
    +log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger}
    +log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false
    +log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender
    +log4j.appender.RMSUMMARY.File=${hadoop.log.dir}/${yarn.server.resourcemanager.appsummary.log.file}
    +log4j.appender.RMSUMMARY.MaxFileSize=256MB
    +log4j.appender.RMSUMMARY.MaxBackupIndex=20
    +log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout
    +log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +
    +# HS audit log configs
    +#mapreduce.hs.audit.logger=INFO,HSAUDIT
    +#log4j.logger.org.apache.hadoop.mapreduce.v2.hs.HSAuditLogger=${mapreduce.hs.audit.logger}
    +#log4j.additivity.org.apache.hadoop.mapreduce.v2.hs.HSAuditLogger=false
    +#log4j.appender.HSAUDIT=org.apache.log4j.DailyRollingFileAppender
    +#log4j.appender.HSAUDIT.File=${hadoop.log.dir}/hs-audit.log
    +#log4j.appender.HSAUDIT.layout=org.apache.log4j.PatternLayout
    +#log4j.appender.HSAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +#log4j.appender.HSAUDIT.DatePattern=.yyyy-MM-dd
    +
    +# Http Server Request Logs
    +#log4j.logger.http.requests.namenode=INFO,namenoderequestlog
    +#log4j.appender.namenoderequestlog=org.apache.hadoop.http.HttpRequestLogAppender
    +#log4j.appender.namenoderequestlog.Filename=${hadoop.log.dir}/jetty-namenode-yyyy_mm_dd.log
    +#log4j.appender.namenoderequestlog.RetainDays=3
    +
    +#log4j.logger.http.requests.datanode=INFO,datanoderequestlog
    +#log4j.appender.datanoderequestlog=org.apache.hadoop.http.HttpRequestLogAppender
    +#log4j.appender.datanoderequestlog.Filename=${hadoop.log.dir}/jetty-datanode-yyyy_mm_dd.log
    +#log4j.appender.datanoderequestlog.RetainDays=3
    +
    +#log4j.logger.http.requests.resourcemanager=INFO,resourcemanagerrequestlog
    +#log4j.appender.resourcemanagerrequestlog=org.apache.hadoop.http.HttpRequestLogAppender
    +#log4j.appender.resourcemanagerrequestlog.Filename=${hadoop.log.dir}/jetty-resourcemanager-yyyy_mm_dd.log
    +#log4j.appender.resourcemanagerrequestlog.RetainDays=3
    +
    +#log4j.logger.http.requests.jobhistory=INFO,jobhistoryrequestlog
    +#log4j.appender.jobhistoryrequestlog=org.apache.hadoop.http.HttpRequestLogAppender
    +#log4j.appender.jobhistoryrequestlog.Filename=${hadoop.log.dir}/jetty-jobhistory-yyyy_mm_dd.log
    +#log4j.appender.jobhistoryrequestlog.RetainDays=3
    +
    +#log4j.logger.http.requests.nodemanager=INFO,nodemanagerrequestlog
    +#log4j.appender.nodemanagerrequestlog=org.apache.hadoop.http.HttpRequestLogAppender
    +#log4j.appender.nodemanagerrequestlog.Filename=${hadoop.log.dir}/jetty-nodemanager-yyyy_mm_dd.log
    +#log4j.appender.nodemanagerrequestlog.RetainDays=3
    +
    +# Appender for viewing information for errors and warnings
    +yarn.ewma.cleanupInterval=300
    +yarn.ewma.messageAgeLimitSeconds=86400
    +yarn.ewma.maxUniqueMessages=250
    +log4j.appender.EWMA=org.apache.hadoop.yarn.util.Log4jWarningErrorMetricsAppender
    +log4j.appender.EWMA.cleanupInterval=${yarn.ewma.cleanupInterval}
    +log4j.appender.EWMA.messageAgeLimitSeconds=${yarn.ewma.messageAgeLimitSeconds}
    +log4j.appender.EWMA.maxUniqueMessages=${yarn.ewma.maxUniqueMessages}
    +
    +## NameNode log
    +log4j.appender.NAMENODE_RFA=org.apache.log4j.RollingFileAppender
    +log4j.appender.NAMENODE_RFA.File=${hadoop.log.dir}/hadoop-namenode.log
    +log4j.appender.NAMENODE_RFA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.NAMENODE_RFA.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +log4j.logger.org.apache.hadoop.hdfs.server.namenode=INFO,NAMENODE_RFA
    +
    +## DataNode log
    +log4j.appender.DATANODE_RFA=org.apache.log4j.RollingFileAppender
    +log4j.appender.DATANODE_RFA.File=${hadoop.log.dir}/hadoop-datanode.log
    +log4j.appender.DATANODE_RFA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.DATANODE_RFA.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +log4j.logger.org.apache.hadoop.hdfs.server.datanode=INFO,DATANODE_RFA
    +
    +## ResourceManager log
    +log4j.appender.RESOURCEMANAGER_RFA=org.apache.log4j.RollingFileAppender
    +log4j.appender.RESOURCEMANAGER_RFA.File=${hadoop.log.dir}/hadoop-resourcemanager.log
    +log4j.appender.RESOURCEMANAGER_RFA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.RESOURCEMANAGER_RFA.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +log4j.logger.org.apache.hadoop.yarn.server.resourcemanager=INFO,RESOURCEMANAGER_RFA
    +
    +## NodeManager log
    +log4j.appender.NODEMANAGER_RFA=org.apache.log4j.RollingFileAppender
    +log4j.appender.NODEMANAGER_RFA.File=${hadoop.log.dir}/hadoop-nodemanager.log
    +log4j.appender.NODEMANAGER_RFA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.NODEMANAGER_RFA.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +log4j.logger.org.apache.hadoop.yarn.server.nodemanager=INFO,NODEMANAGER_RFA
    +
    +## HistoryServer log
    +log4j.appender.HISTORYSERVER_RFA=org.apache.log4j.RollingFileAppender
    --- End diff --
    
    hmm, maybe they help in debugging. Leave them in just in case?


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204314392
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
    @@ -0,0 +1,159 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +#
    +# This image is modified version of Knappek/docker-hadoop-secure
    +#   * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +#
    +# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    +#   * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +#
    +# Author: Aljoscha Krettek
    +# Date:   2018 May, 15
    +#
    +# Creates multi-node, kerberized Hadoop cluster on Docker
    +
    +FROM sequenceiq/pam:ubuntu-14.04
    +MAINTAINER aljoscha
    +
    +USER root
    +
    +RUN addgroup hadoop
    +RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
    +RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
    +RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
    +
    +RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
    +
    +# install dev tools
    +RUN apt-get update
    +RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync unzip
    +
    +# Kerberos client
    +RUN apt-get install krb5-user -y
    +RUN mkdir -p /var/log/kerberos
    +RUN touch /var/log/kerberos/kadmind.log
    +
    +# passwordless ssh
    +RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key /root/.ssh/id_rsa
    +RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
    +RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
    +
    +# java
    +RUN mkdir -p /usr/java/default && \
    --- End diff --
    
    I think I possibly could but I don't know exactly what else I then would need to setup to make the whole Hadoop thing work


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r203989036
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
    @@ -0,0 +1,159 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +#
    +# This image is modified version of Knappek/docker-hadoop-secure
    +#   * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +#
    +# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    +#   * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +#
    +# Author: Aljoscha Krettek
    +# Date:   2018 May, 15
    +#
    +# Creates multi-node, kerberized Hadoop cluster on Docker
    +
    +FROM sequenceiq/pam:ubuntu-14.04
    +MAINTAINER aljoscha
    +
    +USER root
    +
    +RUN addgroup hadoop
    +RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
    +RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
    +RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
    +
    +RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
    +
    +# install dev tools
    +RUN apt-get update
    +RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync unzip
    +
    +# Kerberos client
    +RUN apt-get install krb5-user -y
    +RUN mkdir -p /var/log/kerberos
    +RUN touch /var/log/kerberos/kadmind.log
    +
    +# passwordless ssh
    +RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key /root/.ssh/id_rsa
    +RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
    +RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
    +
    +# java
    +RUN mkdir -p /usr/java/default && \
    +     curl -Ls 'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz' -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
    +     tar --strip-components=1 -xz -C /usr/java/default/
    +
    +ENV JAVA_HOME /usr/java/default
    +ENV PATH $PATH:$JAVA_HOME/bin
    +
    +RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
    +RUN unzip jce_policy-8.zip
    +RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar /UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
    +
    +ENV HADOOP_VERSION=2.8.4
    +
    +# ENV HADOOP_URL https://www.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
    --- End diff --
    
    removing


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by dawidwys <gi...@git.apache.org>.
Github user dawidwys commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204057806
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml ---
    @@ -0,0 +1,87 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +version: '3.5'
    +
    +networks:
    +  docker-hadoop-cluster-network:
    +    driver: bridge
    +    name: docker-hadoop-cluster-network
    +
    +services:
    +  kdc:
    +    container_name: "kdc"
    +    hostname: kdc.kerberos.com
    +    image: sequenceiq/kerberos
    +    networks:
    +      - docker-hadoop-cluster-network
    +    environment:
    +      REALM: EXAMPLE.COM
    +      DOMAIN_REALM: kdc.kerberos.com
    +
    +  master:
    +    image: ${DOCKER_HADOOP_IMAGE_NAME:-flink/docker-hadoop-secure-cluster:latest}
    +    command: master
    +    depends_on:
    +      - kdc
    +    ports:
    +      - "50070:50070"
    +      - "50470:50470"
    +      - "8088:8088"
    +      - "19888:19888"
    +      - "8188:8188"
    +    container_name: "master"
    +    hostname: master.docker-hadoop-cluster-network
    +    networks:
    +      - docker-hadoop-cluster-network
    +    environment:
    +      KRB_REALM: EXAMPLE.COM
    +      DOMAIN_REALM: kdc.kerberos.com
    +
    +  slave1:
    --- End diff --
    
    Maybe create just one slave and just use `docker-compose scale`? You run flink from within container anyway. So It could all be uatomatical.


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r203989614
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/config/log4j.properties ---
    @@ -0,0 +1,354 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +
    +# Define some default values that can be overridden by system properties
    +hadoop.root.logger=INFO,console
    +hadoop.log.dir=.
    +hadoop.log.file=hadoop.log
    +
    +# Define the root logger to the system property "hadoop.root.logger".
    +log4j.rootLogger=${hadoop.root.logger}, EventCounter
    +
    +# Logging Threshold
    +log4j.threshold=ALL
    +
    +# Null Appender
    +log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender
    +
    +#
    +# Rolling File Appender - cap space usage at 5gb.
    +#
    +hadoop.log.maxfilesize=256MB
    +hadoop.log.maxbackupindex=20
    +log4j.appender.RFA=org.apache.log4j.RollingFileAppender
    +log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
    +
    +log4j.appender.RFA.MaxFileSize=${hadoop.log.maxfilesize}
    +log4j.appender.RFA.MaxBackupIndex=${hadoop.log.maxbackupindex}
    +
    +log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
    +
    +# Pattern format: Date LogLevel LoggerName LogMessage
    +log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    +# Debugging Pattern format
    +#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
    +
    +
    +#
    +# Daily Rolling File Appender
    +#
    +
    +log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
    +log4j.appender.DRFA.File=${hadoop.log.dir}/${hadoop.log.file}
    +
    +# Rollover at midnight
    +log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
    +
    +log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
    +
    +# Pattern format: Date LogLevel LoggerName LogMessage
    +log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    +# Debugging Pattern format
    +#log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
    +
    +
    +#
    +# console
    +# Add "console" to rootlogger above if you want to use this
    +#
    +
    +log4j.appender.console=org.apache.log4j.ConsoleAppender
    +log4j.appender.console.target=System.err
    +log4j.appender.console.layout=org.apache.log4j.PatternLayout
    +log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
    +
    +#
    +# TaskLog Appender
    +#
    +
    +#Default values
    +hadoop.tasklog.taskid=null
    +hadoop.tasklog.iscleanup=false
    +hadoop.tasklog.noKeepSplits=4
    +hadoop.tasklog.totalLogFileSize=100
    +hadoop.tasklog.purgeLogSplits=true
    +hadoop.tasklog.logsRetainHours=12
    +
    +log4j.appender.TLA=org.apache.hadoop.mapred.TaskLogAppender
    +log4j.appender.TLA.taskId=${hadoop.tasklog.taskid}
    +log4j.appender.TLA.isCleanup=${hadoop.tasklog.iscleanup}
    +log4j.appender.TLA.totalLogFileSize=${hadoop.tasklog.totalLogFileSize}
    +
    +log4j.appender.TLA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.TLA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    +
    +#
    +# HDFS block state change log from block manager
    +#
    +# Uncomment the following to log normal block state change
    +# messages from BlockManager in NameNode.
    +#log4j.logger.BlockStateChange=DEBUG
    +
    +#
    +#Security appender
    +#
    +hadoop.security.logger=INFO,NullAppender
    +hadoop.security.log.maxfilesize=256MB
    +hadoop.security.log.maxbackupindex=20
    +log4j.category.SecurityLogger=${hadoop.security.logger}
    +hadoop.security.log.file=SecurityAuth-${user.name}.audit
    +log4j.appender.RFAS=org.apache.log4j.RollingFileAppender
    +log4j.appender.RFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
    +log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout
    +log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    +log4j.appender.RFAS.MaxFileSize=${hadoop.security.log.maxfilesize}
    +log4j.appender.RFAS.MaxBackupIndex=${hadoop.security.log.maxbackupindex}
    +
    +#
    +# Daily Rolling Security appender
    +#
    +log4j.appender.DRFAS=org.apache.log4j.DailyRollingFileAppender
    +log4j.appender.DRFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
    +log4j.appender.DRFAS.layout=org.apache.log4j.PatternLayout
    +log4j.appender.DRFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    +log4j.appender.DRFAS.DatePattern=.yyyy-MM-dd
    +
    +#
    +# hadoop configuration logging
    +#
    +
    +# Uncomment the following line to turn off configuration deprecation warnings.
    +# log4j.logger.org.apache.hadoop.conf.Configuration.deprecation=WARN
    +
    +#
    +# hdfs audit logging
    +#
    +hdfs.audit.logger=INFO,NullAppender
    +hdfs.audit.log.maxfilesize=256MB
    +hdfs.audit.log.maxbackupindex=20
    +log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=${hdfs.audit.logger}
    +log4j.additivity.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=false
    +log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender
    +log4j.appender.RFAAUDIT.File=${hadoop.log.dir}/hdfs-audit.log
    +log4j.appender.RFAAUDIT.layout=org.apache.log4j.PatternLayout
    +log4j.appender.RFAAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +log4j.appender.RFAAUDIT.MaxFileSize=${hdfs.audit.log.maxfilesize}
    +log4j.appender.RFAAUDIT.MaxBackupIndex=${hdfs.audit.log.maxbackupindex}
    +
    +#
    +# NameNode metrics logging.
    +# The default is to retain two namenode-metrics.log files up to 64MB each.
    +#
    +namenode.metrics.logger=INFO,NullAppender
    +log4j.logger.NameNodeMetricsLog=${namenode.metrics.logger}
    +log4j.additivity.NameNodeMetricsLog=false
    +log4j.appender.NNMETRICSRFA=org.apache.log4j.RollingFileAppender
    +log4j.appender.NNMETRICSRFA.File=${hadoop.log.dir}/namenode-metrics.log
    +log4j.appender.NNMETRICSRFA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.NNMETRICSRFA.layout.ConversionPattern=%d{ISO8601} %m%n
    +log4j.appender.NNMETRICSRFA.MaxBackupIndex=1
    +log4j.appender.NNMETRICSRFA.MaxFileSize=64MB
    +
    +#
    +# DataNode metrics logging.
    +# The default is to retain two datanode-metrics.log files up to 64MB each.
    +#
    +datanode.metrics.logger=INFO,NullAppender
    +log4j.logger.DataNodeMetricsLog=${datanode.metrics.logger}
    +log4j.additivity.DataNodeMetricsLog=false
    +log4j.appender.DNMETRICSRFA=org.apache.log4j.RollingFileAppender
    +log4j.appender.DNMETRICSRFA.File=${hadoop.log.dir}/datanode-metrics.log
    +log4j.appender.DNMETRICSRFA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.DNMETRICSRFA.layout.ConversionPattern=%d{ISO8601} %m%n
    +log4j.appender.DNMETRICSRFA.MaxBackupIndex=1
    +log4j.appender.DNMETRICSRFA.MaxFileSize=64MB
    +
    +#
    +# mapred audit logging
    +#
    +mapred.audit.logger=INFO,NullAppender
    +mapred.audit.log.maxfilesize=256MB
    +mapred.audit.log.maxbackupindex=20
    +log4j.logger.org.apache.hadoop.mapred.AuditLogger=${mapred.audit.logger}
    +log4j.additivity.org.apache.hadoop.mapred.AuditLogger=false
    +log4j.appender.MRAUDIT=org.apache.log4j.RollingFileAppender
    +log4j.appender.MRAUDIT.File=${hadoop.log.dir}/mapred-audit.log
    +log4j.appender.MRAUDIT.layout=org.apache.log4j.PatternLayout
    +log4j.appender.MRAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +log4j.appender.MRAUDIT.MaxFileSize=${mapred.audit.log.maxfilesize}
    +log4j.appender.MRAUDIT.MaxBackupIndex=${mapred.audit.log.maxbackupindex}
    +
    +# Custom Logging levels
    +
    +#log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG
    +#log4j.logger.org.apache.hadoop.mapred.TaskTracker=DEBUG
    +#log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=DEBUG
    +
    +# Jets3t library
    +log4j.logger.org.jets3t.service.impl.rest.httpclient.RestS3Service=ERROR
    +
    +# AWS SDK & S3A FileSystem
    +log4j.logger.com.amazonaws=ERROR
    +log4j.logger.com.amazonaws.http.AmazonHttpClient=ERROR
    +log4j.logger.org.apache.hadoop.fs.s3a.S3AFileSystem=WARN
    +
    +#
    +# Event Counter Appender
    +# Sends counts of logging messages at different severity levels to Hadoop Metrics.
    +#
    +log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
    +
    +#
    +# Job Summary Appender
    +#
    +# Use following logger to send summary to separate file defined by
    +# hadoop.mapreduce.jobsummary.log.file :
    +# hadoop.mapreduce.jobsummary.logger=INFO,JSA
    +#
    +hadoop.mapreduce.jobsummary.logger=${hadoop.root.logger}
    +hadoop.mapreduce.jobsummary.log.file=hadoop-mapreduce.jobsummary.log
    +hadoop.mapreduce.jobsummary.log.maxfilesize=256MB
    +hadoop.mapreduce.jobsummary.log.maxbackupindex=20
    +log4j.appender.JSA=org.apache.log4j.RollingFileAppender
    +log4j.appender.JSA.File=${hadoop.log.dir}/${hadoop.mapreduce.jobsummary.log.file}
    +log4j.appender.JSA.MaxFileSize=${hadoop.mapreduce.jobsummary.log.maxfilesize}
    +log4j.appender.JSA.MaxBackupIndex=${hadoop.mapreduce.jobsummary.log.maxbackupindex}
    +log4j.appender.JSA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.JSA.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
    +log4j.logger.org.apache.hadoop.mapred.JobInProgress$JobSummary=${hadoop.mapreduce.jobsummary.logger}
    +log4j.additivity.org.apache.hadoop.mapred.JobInProgress$JobSummary=false
    +
    +#
    +# shuffle connection log from shuffleHandler
    +# Uncomment the following line to enable logging of shuffle connections
    +# log4j.logger.org.apache.hadoop.mapred.ShuffleHandler.audit=DEBUG
    +
    +#
    +# Yarn ResourceManager Application Summary Log
    +#
    +# Set the ResourceManager summary log filename
    +yarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log
    +# Set the ResourceManager summary log level and appender
    +yarn.server.resourcemanager.appsummary.logger=${hadoop.root.logger}
    +#yarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY
    +
    +# To enable AppSummaryLogging for the RM,
    +# set yarn.server.resourcemanager.appsummary.logger to
    +# <LEVEL>,RMSUMMARY in hadoop-env.sh
    +
    +# Appender for ResourceManager Application Summary Log
    +# Requires the following properties to be set
    +#    - hadoop.log.dir (Hadoop Log directory)
    +#    - yarn.server.resourcemanager.appsummary.log.file (resource manager app summary log filename)
    +#    - yarn.server.resourcemanager.appsummary.logger (resource manager app summary log level and appender)
    +
    +log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger}
    +log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false
    +log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender
    +log4j.appender.RMSUMMARY.File=${hadoop.log.dir}/${yarn.server.resourcemanager.appsummary.log.file}
    +log4j.appender.RMSUMMARY.MaxFileSize=256MB
    +log4j.appender.RMSUMMARY.MaxBackupIndex=20
    +log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout
    +log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +
    +# HS audit log configs
    +#mapreduce.hs.audit.logger=INFO,HSAUDIT
    +#log4j.logger.org.apache.hadoop.mapreduce.v2.hs.HSAuditLogger=${mapreduce.hs.audit.logger}
    +#log4j.additivity.org.apache.hadoop.mapreduce.v2.hs.HSAuditLogger=false
    +#log4j.appender.HSAUDIT=org.apache.log4j.DailyRollingFileAppender
    +#log4j.appender.HSAUDIT.File=${hadoop.log.dir}/hs-audit.log
    +#log4j.appender.HSAUDIT.layout=org.apache.log4j.PatternLayout
    +#log4j.appender.HSAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +#log4j.appender.HSAUDIT.DatePattern=.yyyy-MM-dd
    +
    +# Http Server Request Logs
    +#log4j.logger.http.requests.namenode=INFO,namenoderequestlog
    +#log4j.appender.namenoderequestlog=org.apache.hadoop.http.HttpRequestLogAppender
    +#log4j.appender.namenoderequestlog.Filename=${hadoop.log.dir}/jetty-namenode-yyyy_mm_dd.log
    +#log4j.appender.namenoderequestlog.RetainDays=3
    +
    +#log4j.logger.http.requests.datanode=INFO,datanoderequestlog
    +#log4j.appender.datanoderequestlog=org.apache.hadoop.http.HttpRequestLogAppender
    +#log4j.appender.datanoderequestlog.Filename=${hadoop.log.dir}/jetty-datanode-yyyy_mm_dd.log
    +#log4j.appender.datanoderequestlog.RetainDays=3
    +
    +#log4j.logger.http.requests.resourcemanager=INFO,resourcemanagerrequestlog
    +#log4j.appender.resourcemanagerrequestlog=org.apache.hadoop.http.HttpRequestLogAppender
    +#log4j.appender.resourcemanagerrequestlog.Filename=${hadoop.log.dir}/jetty-resourcemanager-yyyy_mm_dd.log
    +#log4j.appender.resourcemanagerrequestlog.RetainDays=3
    +
    +#log4j.logger.http.requests.jobhistory=INFO,jobhistoryrequestlog
    +#log4j.appender.jobhistoryrequestlog=org.apache.hadoop.http.HttpRequestLogAppender
    +#log4j.appender.jobhistoryrequestlog.Filename=${hadoop.log.dir}/jetty-jobhistory-yyyy_mm_dd.log
    +#log4j.appender.jobhistoryrequestlog.RetainDays=3
    +
    +#log4j.logger.http.requests.nodemanager=INFO,nodemanagerrequestlog
    +#log4j.appender.nodemanagerrequestlog=org.apache.hadoop.http.HttpRequestLogAppender
    +#log4j.appender.nodemanagerrequestlog.Filename=${hadoop.log.dir}/jetty-nodemanager-yyyy_mm_dd.log
    +#log4j.appender.nodemanagerrequestlog.RetainDays=3
    +
    +# Appender for viewing information for errors and warnings
    +yarn.ewma.cleanupInterval=300
    +yarn.ewma.messageAgeLimitSeconds=86400
    +yarn.ewma.maxUniqueMessages=250
    +log4j.appender.EWMA=org.apache.hadoop.yarn.util.Log4jWarningErrorMetricsAppender
    +log4j.appender.EWMA.cleanupInterval=${yarn.ewma.cleanupInterval}
    +log4j.appender.EWMA.messageAgeLimitSeconds=${yarn.ewma.messageAgeLimitSeconds}
    +log4j.appender.EWMA.maxUniqueMessages=${yarn.ewma.maxUniqueMessages}
    +
    +## NameNode log
    +log4j.appender.NAMENODE_RFA=org.apache.log4j.RollingFileAppender
    +log4j.appender.NAMENODE_RFA.File=${hadoop.log.dir}/hadoop-namenode.log
    +log4j.appender.NAMENODE_RFA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.NAMENODE_RFA.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +log4j.logger.org.apache.hadoop.hdfs.server.namenode=INFO,NAMENODE_RFA
    +
    +## DataNode log
    +log4j.appender.DATANODE_RFA=org.apache.log4j.RollingFileAppender
    +log4j.appender.DATANODE_RFA.File=${hadoop.log.dir}/hadoop-datanode.log
    +log4j.appender.DATANODE_RFA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.DATANODE_RFA.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +log4j.logger.org.apache.hadoop.hdfs.server.datanode=INFO,DATANODE_RFA
    +
    +## ResourceManager log
    +log4j.appender.RESOURCEMANAGER_RFA=org.apache.log4j.RollingFileAppender
    +log4j.appender.RESOURCEMANAGER_RFA.File=${hadoop.log.dir}/hadoop-resourcemanager.log
    +log4j.appender.RESOURCEMANAGER_RFA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.RESOURCEMANAGER_RFA.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +log4j.logger.org.apache.hadoop.yarn.server.resourcemanager=INFO,RESOURCEMANAGER_RFA
    +
    +## NodeManager log
    +log4j.appender.NODEMANAGER_RFA=org.apache.log4j.RollingFileAppender
    +log4j.appender.NODEMANAGER_RFA.File=${hadoop.log.dir}/hadoop-nodemanager.log
    +log4j.appender.NODEMANAGER_RFA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.NODEMANAGER_RFA.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +log4j.logger.org.apache.hadoop.yarn.server.nodemanager=INFO,NODEMANAGER_RFA
    +
    +## HistoryServer log
    +log4j.appender.HISTORYSERVER_RFA=org.apache.log4j.RollingFileAppender
    --- End diff --
    
    most of the stuff in there we don't really need. Removing them


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by dawidwys <gi...@git.apache.org>.
Github user dawidwys commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204057245
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/bootstrap.sh ---
    @@ -0,0 +1,121 @@
    +#!/bin/bash
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +
    +: ${HADOOP_PREFIX:=/usr/local/hadoop}
    +
    +$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
    +
    +rm /tmp/*.pid
    +
    +# installing libraries if any - (resource urls added comma separated to the ACP system variable)
    +cd $HADOOP_PREFIX/share/hadoop/common ; for cp in ${ACP//,/ }; do  echo == $cp; curl -LO $cp ; done; cd -
    +
    +# kerberos client
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" /etc/krb5.conf
    +sed -i "s/example.com/${DOMAIN_REALM}/g" /etc/krb5.conf
    +
    +# update config files
    +sed -i "s/HOSTNAME/$(hostname -f)/g" $HADOOP_PREFIX/etc/hadoop/core-site.xml
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" $HADOOP_PREFIX/etc/hadoop/core-site.xml
    +sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" $HADOOP_PREFIX/etc/hadoop/core-site.xml
    +
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" $HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
    +sed -i "s/HOSTNAME/$(hostname -f)/g" $HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
    +sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" $HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
    +
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" $HADOOP_PREFIX/etc/hadoop/yarn-site.xml
    +sed -i "s/HOSTNAME/$(hostname -f)/g" $HADOOP_PREFIX/etc/hadoop/yarn-site.xml
    +sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" $HADOOP_PREFIX/etc/hadoop/yarn-site.xml
    +
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" $HADOOP_PREFIX/etc/hadoop/mapred-site.xml
    +sed -i "s/HOSTNAME/$(hostname -f)/g" $HADOOP_PREFIX/etc/hadoop/mapred-site.xml
    +sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" $HADOOP_PREFIX/etc/hadoop/mapred-site.xml
    +
    +sed -i "s#/usr/local/hadoop/bin/container-executor#${NM_CONTAINER_EXECUTOR_PATH}#g" $HADOOP_PREFIX/etc/hadoop/yarn-site.xml
    +
    +# create namenode kerberos principal and keytab
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc -randkey hdfs/$(hostname -f)@${KRB_REALM}"
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc -randkey mapred/$(hostname -f)@${KRB_REALM}"
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc -randkey yarn/$(hostname -f)@${KRB_REALM}"
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc -randkey HTTP/$(hostname -f)@${KRB_REALM}"
    +
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k hdfs.keytab hdfs/$(hostname -f) HTTP/$(hostname -f)"
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k mapred.keytab mapred/$(hostname -f) HTTP/$(hostname -f)"
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k yarn.keytab yarn/$(hostname -f) HTTP/$(hostname -f)"
    +
    +mkdir -p ${KEYTAB_DIR}
    +mv hdfs.keytab ${KEYTAB_DIR}
    +mv mapred.keytab ${KEYTAB_DIR}
    +mv yarn.keytab ${KEYTAB_DIR}
    +chmod 400 ${KEYTAB_DIR}/hdfs.keytab
    +chmod 400 ${KEYTAB_DIR}/mapred.keytab
    +chmod 400 ${KEYTAB_DIR}/yarn.keytab
    +chown hdfs:hadoop ${KEYTAB_DIR}/hdfs.keytab
    +chown mapred:hadoop ${KEYTAB_DIR}/mapred.keytab
    +chown yarn:hadoop ${KEYTAB_DIR}/yarn.keytab
    +
    +service ssh start
    +
    +if [ "$1" == "--help" -o "$1" == "-h" ]; then
    +    echo "Usage: $(basename $0) (master|worker)"
    +    exit 0
    +elif [ "$1" == "master" ]; then
    +    yes| sudo -E -u hdfs $HADOOP_PREFIX/bin/hdfs namenode -format
    +
    +    nohup sudo -E -u hdfs $HADOOP_PREFIX/bin/hdfs namenode 2>> /var/log/hadoop/namenode.err >> /var/log/hadoop/namenode.out &
    +    nohup sudo -E -u yarn $HADOOP_PREFIX/bin/yarn resourcemanager 2>> /var/log/hadoop/resourcemanager.err >> /var/log/hadoop/resourcemanager.out &
    +    nohup sudo -E -u yarn $HADOOP_PREFIX/bin/yarn timelineserver 2>> /var/log/hadoop/timelineserver.err >> /var/log/hadoop/timelineserver.out &
    +    nohup sudo -E -u mapred $HADOOP_PREFIX/bin/mapred historyserver 2>> /var/log/hadoop/historyserver.err >> /var/log/hadoop/historyserver.out &
    +
    +
    +    kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc -randkey root@${KRB_REALM}"
    +    kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k /root/root.keytab root"
    +
    +    kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc -pw hadoop-user hadoop-user@${KRB_REALM}"
    +    kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k /home/hadoop-user/hadoop-user.keytab hadoop-user"
    +    chown hadoop-user:hadoop-user /home/hadoop-user/hadoop-user
    +
    +    kinit -kt /root/root.keytab root
    +
    +    hdfs dfsadmin -safemode wait
    +    while [ $? -ne 0 ]; do hdfs dfsadmin -safemode wait; done
    +
    +    hdfs dfs -chown hdfs:hadoop /
    +    hdfs dfs -chmod 755 /
    +    hdfs dfs -mkdir /tmp
    +    hdfs dfs -chown hdfs:hadoop /tmp
    +    hdfs dfs -chmod -R 1777 /tmp
    +    hdfs dfs -mkdir /tmp/logs
    +    hdfs dfs -chown yarn:hadoop /tmp/logs
    +    hdfs dfs -chmod 1777 /tmp/logs
    +
    +    hdfs dfs -mkdir -p /user/hadoop-user
    +    hdfs dfs -chown hadoop-user:hadoop-user /user/hadoop-user
    +
    +    kdestroy
    +
    +    while true; do sleep 1000; done
    +elif [ "$1" == "worker" ]; then
    +    nohup sudo -E -u hdfs $HADOOP_PREFIX/bin/hdfs datanode 2>> /var/log/hadoop/datanode.err >> /var/log/hadoop/datanode.out &
    +    nohup sudo -E -u yarn $HADOOP_PREFIX/bin/yarn nodemanager 2>> /var/log/hadoop/nodemanager.err >> /var/log/hadoop/nodemanager.out &
    +    while true; do sleep 1000; done
    +elif [ $1 == "bash" ]; then
    --- End diff --
    
    Is the if necessary?


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r203972431
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/config/log4j.properties ---
    @@ -0,0 +1,354 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +
    +# Define some default values that can be overridden by system properties
    +hadoop.root.logger=INFO,console
    +hadoop.log.dir=.
    +hadoop.log.file=hadoop.log
    +
    +# Define the root logger to the system property "hadoop.root.logger".
    +log4j.rootLogger=${hadoop.root.logger}, EventCounter
    +
    +# Logging Threshold
    +log4j.threshold=ALL
    +
    +# Null Appender
    +log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender
    +
    +#
    +# Rolling File Appender - cap space usage at 5gb.
    +#
    +hadoop.log.maxfilesize=256MB
    +hadoop.log.maxbackupindex=20
    +log4j.appender.RFA=org.apache.log4j.RollingFileAppender
    +log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
    +
    +log4j.appender.RFA.MaxFileSize=${hadoop.log.maxfilesize}
    +log4j.appender.RFA.MaxBackupIndex=${hadoop.log.maxbackupindex}
    +
    +log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
    +
    +# Pattern format: Date LogLevel LoggerName LogMessage
    +log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    +# Debugging Pattern format
    +#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
    +
    +
    +#
    +# Daily Rolling File Appender
    +#
    +
    +log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
    +log4j.appender.DRFA.File=${hadoop.log.dir}/${hadoop.log.file}
    +
    +# Rollover at midnight
    +log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
    +
    +log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
    +
    +# Pattern format: Date LogLevel LoggerName LogMessage
    +log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    +# Debugging Pattern format
    +#log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
    +
    +
    +#
    +# console
    +# Add "console" to rootlogger above if you want to use this
    +#
    +
    +log4j.appender.console=org.apache.log4j.ConsoleAppender
    +log4j.appender.console.target=System.err
    +log4j.appender.console.layout=org.apache.log4j.PatternLayout
    +log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
    +
    +#
    +# TaskLog Appender
    +#
    +
    +#Default values
    +hadoop.tasklog.taskid=null
    +hadoop.tasklog.iscleanup=false
    +hadoop.tasklog.noKeepSplits=4
    +hadoop.tasklog.totalLogFileSize=100
    +hadoop.tasklog.purgeLogSplits=true
    +hadoop.tasklog.logsRetainHours=12
    +
    +log4j.appender.TLA=org.apache.hadoop.mapred.TaskLogAppender
    +log4j.appender.TLA.taskId=${hadoop.tasklog.taskid}
    +log4j.appender.TLA.isCleanup=${hadoop.tasklog.iscleanup}
    +log4j.appender.TLA.totalLogFileSize=${hadoop.tasklog.totalLogFileSize}
    +
    +log4j.appender.TLA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.TLA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    +
    +#
    +# HDFS block state change log from block manager
    +#
    +# Uncomment the following to log normal block state change
    +# messages from BlockManager in NameNode.
    +#log4j.logger.BlockStateChange=DEBUG
    +
    +#
    +#Security appender
    +#
    +hadoop.security.logger=INFO,NullAppender
    +hadoop.security.log.maxfilesize=256MB
    +hadoop.security.log.maxbackupindex=20
    +log4j.category.SecurityLogger=${hadoop.security.logger}
    +hadoop.security.log.file=SecurityAuth-${user.name}.audit
    +log4j.appender.RFAS=org.apache.log4j.RollingFileAppender
    +log4j.appender.RFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
    +log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout
    +log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    +log4j.appender.RFAS.MaxFileSize=${hadoop.security.log.maxfilesize}
    +log4j.appender.RFAS.MaxBackupIndex=${hadoop.security.log.maxbackupindex}
    +
    +#
    +# Daily Rolling Security appender
    +#
    +log4j.appender.DRFAS=org.apache.log4j.DailyRollingFileAppender
    +log4j.appender.DRFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
    +log4j.appender.DRFAS.layout=org.apache.log4j.PatternLayout
    +log4j.appender.DRFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    +log4j.appender.DRFAS.DatePattern=.yyyy-MM-dd
    +
    +#
    +# hadoop configuration logging
    +#
    +
    +# Uncomment the following line to turn off configuration deprecation warnings.
    +# log4j.logger.org.apache.hadoop.conf.Configuration.deprecation=WARN
    +
    +#
    +# hdfs audit logging
    +#
    +hdfs.audit.logger=INFO,NullAppender
    +hdfs.audit.log.maxfilesize=256MB
    +hdfs.audit.log.maxbackupindex=20
    +log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=${hdfs.audit.logger}
    +log4j.additivity.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=false
    +log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender
    +log4j.appender.RFAAUDIT.File=${hadoop.log.dir}/hdfs-audit.log
    +log4j.appender.RFAAUDIT.layout=org.apache.log4j.PatternLayout
    +log4j.appender.RFAAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +log4j.appender.RFAAUDIT.MaxFileSize=${hdfs.audit.log.maxfilesize}
    +log4j.appender.RFAAUDIT.MaxBackupIndex=${hdfs.audit.log.maxbackupindex}
    +
    +#
    +# NameNode metrics logging.
    +# The default is to retain two namenode-metrics.log files up to 64MB each.
    +#
    +namenode.metrics.logger=INFO,NullAppender
    +log4j.logger.NameNodeMetricsLog=${namenode.metrics.logger}
    +log4j.additivity.NameNodeMetricsLog=false
    +log4j.appender.NNMETRICSRFA=org.apache.log4j.RollingFileAppender
    +log4j.appender.NNMETRICSRFA.File=${hadoop.log.dir}/namenode-metrics.log
    +log4j.appender.NNMETRICSRFA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.NNMETRICSRFA.layout.ConversionPattern=%d{ISO8601} %m%n
    +log4j.appender.NNMETRICSRFA.MaxBackupIndex=1
    +log4j.appender.NNMETRICSRFA.MaxFileSize=64MB
    +
    +#
    +# DataNode metrics logging.
    +# The default is to retain two datanode-metrics.log files up to 64MB each.
    +#
    +datanode.metrics.logger=INFO,NullAppender
    +log4j.logger.DataNodeMetricsLog=${datanode.metrics.logger}
    +log4j.additivity.DataNodeMetricsLog=false
    +log4j.appender.DNMETRICSRFA=org.apache.log4j.RollingFileAppender
    +log4j.appender.DNMETRICSRFA.File=${hadoop.log.dir}/datanode-metrics.log
    +log4j.appender.DNMETRICSRFA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.DNMETRICSRFA.layout.ConversionPattern=%d{ISO8601} %m%n
    +log4j.appender.DNMETRICSRFA.MaxBackupIndex=1
    +log4j.appender.DNMETRICSRFA.MaxFileSize=64MB
    +
    +#
    +# mapred audit logging
    +#
    +mapred.audit.logger=INFO,NullAppender
    +mapred.audit.log.maxfilesize=256MB
    +mapred.audit.log.maxbackupindex=20
    +log4j.logger.org.apache.hadoop.mapred.AuditLogger=${mapred.audit.logger}
    +log4j.additivity.org.apache.hadoop.mapred.AuditLogger=false
    +log4j.appender.MRAUDIT=org.apache.log4j.RollingFileAppender
    +log4j.appender.MRAUDIT.File=${hadoop.log.dir}/mapred-audit.log
    +log4j.appender.MRAUDIT.layout=org.apache.log4j.PatternLayout
    +log4j.appender.MRAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +log4j.appender.MRAUDIT.MaxFileSize=${mapred.audit.log.maxfilesize}
    +log4j.appender.MRAUDIT.MaxBackupIndex=${mapred.audit.log.maxbackupindex}
    +
    +# Custom Logging levels
    +
    +#log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG
    +#log4j.logger.org.apache.hadoop.mapred.TaskTracker=DEBUG
    +#log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=DEBUG
    +
    +# Jets3t library
    +log4j.logger.org.jets3t.service.impl.rest.httpclient.RestS3Service=ERROR
    +
    +# AWS SDK & S3A FileSystem
    +log4j.logger.com.amazonaws=ERROR
    +log4j.logger.com.amazonaws.http.AmazonHttpClient=ERROR
    +log4j.logger.org.apache.hadoop.fs.s3a.S3AFileSystem=WARN
    +
    +#
    +# Event Counter Appender
    +# Sends counts of logging messages at different severity levels to Hadoop Metrics.
    +#
    +log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
    +
    +#
    +# Job Summary Appender
    +#
    +# Use following logger to send summary to separate file defined by
    +# hadoop.mapreduce.jobsummary.log.file :
    +# hadoop.mapreduce.jobsummary.logger=INFO,JSA
    +#
    +hadoop.mapreduce.jobsummary.logger=${hadoop.root.logger}
    +hadoop.mapreduce.jobsummary.log.file=hadoop-mapreduce.jobsummary.log
    +hadoop.mapreduce.jobsummary.log.maxfilesize=256MB
    +hadoop.mapreduce.jobsummary.log.maxbackupindex=20
    +log4j.appender.JSA=org.apache.log4j.RollingFileAppender
    +log4j.appender.JSA.File=${hadoop.log.dir}/${hadoop.mapreduce.jobsummary.log.file}
    +log4j.appender.JSA.MaxFileSize=${hadoop.mapreduce.jobsummary.log.maxfilesize}
    +log4j.appender.JSA.MaxBackupIndex=${hadoop.mapreduce.jobsummary.log.maxbackupindex}
    +log4j.appender.JSA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.JSA.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
    +log4j.logger.org.apache.hadoop.mapred.JobInProgress$JobSummary=${hadoop.mapreduce.jobsummary.logger}
    +log4j.additivity.org.apache.hadoop.mapred.JobInProgress$JobSummary=false
    +
    +#
    +# shuffle connection log from shuffleHandler
    +# Uncomment the following line to enable logging of shuffle connections
    +# log4j.logger.org.apache.hadoop.mapred.ShuffleHandler.audit=DEBUG
    +
    +#
    +# Yarn ResourceManager Application Summary Log
    +#
    +# Set the ResourceManager summary log filename
    +yarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log
    +# Set the ResourceManager summary log level and appender
    +yarn.server.resourcemanager.appsummary.logger=${hadoop.root.logger}
    +#yarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY
    +
    +# To enable AppSummaryLogging for the RM,
    +# set yarn.server.resourcemanager.appsummary.logger to
    +# <LEVEL>,RMSUMMARY in hadoop-env.sh
    +
    +# Appender for ResourceManager Application Summary Log
    +# Requires the following properties to be set
    +#    - hadoop.log.dir (Hadoop Log directory)
    +#    - yarn.server.resourcemanager.appsummary.log.file (resource manager app summary log filename)
    +#    - yarn.server.resourcemanager.appsummary.logger (resource manager app summary log level and appender)
    +
    +log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger}
    +log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false
    +log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender
    +log4j.appender.RMSUMMARY.File=${hadoop.log.dir}/${yarn.server.resourcemanager.appsummary.log.file}
    +log4j.appender.RMSUMMARY.MaxFileSize=256MB
    +log4j.appender.RMSUMMARY.MaxBackupIndex=20
    +log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout
    +log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +
    +# HS audit log configs
    +#mapreduce.hs.audit.logger=INFO,HSAUDIT
    +#log4j.logger.org.apache.hadoop.mapreduce.v2.hs.HSAuditLogger=${mapreduce.hs.audit.logger}
    +#log4j.additivity.org.apache.hadoop.mapreduce.v2.hs.HSAuditLogger=false
    +#log4j.appender.HSAUDIT=org.apache.log4j.DailyRollingFileAppender
    +#log4j.appender.HSAUDIT.File=${hadoop.log.dir}/hs-audit.log
    +#log4j.appender.HSAUDIT.layout=org.apache.log4j.PatternLayout
    +#log4j.appender.HSAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +#log4j.appender.HSAUDIT.DatePattern=.yyyy-MM-dd
    +
    +# Http Server Request Logs
    +#log4j.logger.http.requests.namenode=INFO,namenoderequestlog
    +#log4j.appender.namenoderequestlog=org.apache.hadoop.http.HttpRequestLogAppender
    +#log4j.appender.namenoderequestlog.Filename=${hadoop.log.dir}/jetty-namenode-yyyy_mm_dd.log
    +#log4j.appender.namenoderequestlog.RetainDays=3
    +
    +#log4j.logger.http.requests.datanode=INFO,datanoderequestlog
    +#log4j.appender.datanoderequestlog=org.apache.hadoop.http.HttpRequestLogAppender
    +#log4j.appender.datanoderequestlog.Filename=${hadoop.log.dir}/jetty-datanode-yyyy_mm_dd.log
    +#log4j.appender.datanoderequestlog.RetainDays=3
    +
    +#log4j.logger.http.requests.resourcemanager=INFO,resourcemanagerrequestlog
    +#log4j.appender.resourcemanagerrequestlog=org.apache.hadoop.http.HttpRequestLogAppender
    +#log4j.appender.resourcemanagerrequestlog.Filename=${hadoop.log.dir}/jetty-resourcemanager-yyyy_mm_dd.log
    +#log4j.appender.resourcemanagerrequestlog.RetainDays=3
    +
    +#log4j.logger.http.requests.jobhistory=INFO,jobhistoryrequestlog
    +#log4j.appender.jobhistoryrequestlog=org.apache.hadoop.http.HttpRequestLogAppender
    +#log4j.appender.jobhistoryrequestlog.Filename=${hadoop.log.dir}/jetty-jobhistory-yyyy_mm_dd.log
    +#log4j.appender.jobhistoryrequestlog.RetainDays=3
    +
    +#log4j.logger.http.requests.nodemanager=INFO,nodemanagerrequestlog
    +#log4j.appender.nodemanagerrequestlog=org.apache.hadoop.http.HttpRequestLogAppender
    +#log4j.appender.nodemanagerrequestlog.Filename=${hadoop.log.dir}/jetty-nodemanager-yyyy_mm_dd.log
    +#log4j.appender.nodemanagerrequestlog.RetainDays=3
    +
    +# Appender for viewing information for errors and warnings
    +yarn.ewma.cleanupInterval=300
    +yarn.ewma.messageAgeLimitSeconds=86400
    +yarn.ewma.maxUniqueMessages=250
    +log4j.appender.EWMA=org.apache.hadoop.yarn.util.Log4jWarningErrorMetricsAppender
    +log4j.appender.EWMA.cleanupInterval=${yarn.ewma.cleanupInterval}
    +log4j.appender.EWMA.messageAgeLimitSeconds=${yarn.ewma.messageAgeLimitSeconds}
    +log4j.appender.EWMA.maxUniqueMessages=${yarn.ewma.maxUniqueMessages}
    +
    +## NameNode log
    +log4j.appender.NAMENODE_RFA=org.apache.log4j.RollingFileAppender
    +log4j.appender.NAMENODE_RFA.File=${hadoop.log.dir}/hadoop-namenode.log
    +log4j.appender.NAMENODE_RFA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.NAMENODE_RFA.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +log4j.logger.org.apache.hadoop.hdfs.server.namenode=INFO,NAMENODE_RFA
    +
    +## DataNode log
    +log4j.appender.DATANODE_RFA=org.apache.log4j.RollingFileAppender
    +log4j.appender.DATANODE_RFA.File=${hadoop.log.dir}/hadoop-datanode.log
    +log4j.appender.DATANODE_RFA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.DATANODE_RFA.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +log4j.logger.org.apache.hadoop.hdfs.server.datanode=INFO,DATANODE_RFA
    +
    +## ResourceManager log
    +log4j.appender.RESOURCEMANAGER_RFA=org.apache.log4j.RollingFileAppender
    +log4j.appender.RESOURCEMANAGER_RFA.File=${hadoop.log.dir}/hadoop-resourcemanager.log
    +log4j.appender.RESOURCEMANAGER_RFA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.RESOURCEMANAGER_RFA.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +log4j.logger.org.apache.hadoop.yarn.server.resourcemanager=INFO,RESOURCEMANAGER_RFA
    +
    +## NodeManager log
    +log4j.appender.NODEMANAGER_RFA=org.apache.log4j.RollingFileAppender
    +log4j.appender.NODEMANAGER_RFA.File=${hadoop.log.dir}/hadoop-nodemanager.log
    +log4j.appender.NODEMANAGER_RFA.layout=org.apache.log4j.PatternLayout
    +log4j.appender.NODEMANAGER_RFA.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    +log4j.logger.org.apache.hadoop.yarn.server.nodemanager=INFO,NODEMANAGER_RFA
    +
    +## HistoryServer log
    +log4j.appender.HISTORYSERVER_RFA=org.apache.log4j.RollingFileAppender
    --- End diff --
    
    do we actually need all this?


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204315672
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
    @@ -0,0 +1,159 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +#
    +# This image is modified version of Knappek/docker-hadoop-secure
    +#   * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +#
    +# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    +#   * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +#
    +# Author: Aljoscha Krettek
    +# Date:   2018 May, 15
    +#
    +# Creates multi-node, kerberized Hadoop cluster on Docker
    +
    +FROM sequenceiq/pam:ubuntu-14.04
    +MAINTAINER aljoscha
    +
    +USER root
    +
    +RUN addgroup hadoop
    +RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
    +RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
    +RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
    +
    +RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
    +
    +# install dev tools
    +RUN apt-get update
    +RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync unzip
    +
    +# Kerberos client
    +RUN apt-get install krb5-user -y
    +RUN mkdir -p /var/log/kerberos
    +RUN touch /var/log/kerberos/kadmind.log
    +
    +# passwordless ssh
    +RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key /root/.ssh/id_rsa
    +RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
    +RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
    +
    +# java
    +RUN mkdir -p /usr/java/default && \
    +     curl -Ls 'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz' -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
    +     tar --strip-components=1 -xz -C /usr/java/default/
    +
    +ENV JAVA_HOME /usr/java/default
    +ENV PATH $PATH:$JAVA_HOME/bin
    +
    +RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
    +RUN unzip jce_policy-8.zip
    +RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar /UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
    +
    +ENV HADOOP_VERSION=2.8.4
    +
    +# ENV HADOOP_URL https://www.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
    +ENV HADOOP_URL http://archive.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
    +RUN set -x \
    +    && curl -fSL "$HADOOP_URL" -o /tmp/hadoop.tar.gz \
    +    && tar -xf /tmp/hadoop.tar.gz -C /usr/local/ \
    +    && rm /tmp/hadoop.tar.gz*
    +
    +WORKDIR /usr/local
    +RUN ln -s /usr/local/hadoop-${HADOOP_VERSION} /usr/local/hadoop
    +RUN chown root:root -R /usr/local/hadoop-${HADOOP_VERSION}/
    +RUN chown root:root -R /usr/local/hadoop/
    +RUN chown root:yarn /usr/local/hadoop/bin/container-executor
    +RUN chmod 6050 /usr/local/hadoop/bin/container-executor
    +RUN mkdir -p /hadoop-data/nm-local-dirs
    +RUN mkdir -p /hadoop-data/nm-log-dirs
    +RUN chown yarn:yarn /hadoop-data
    +RUN chown yarn:yarn /hadoop-data/nm-local-dirs
    +RUN chown yarn:yarn /hadoop-data/nm-log-dirs
    +RUN chmod 755 /hadoop-data
    +RUN chmod 755 /hadoop-data/nm-local-dirs
    +RUN chmod 755 /hadoop-data/nm-log-dirs
    +
    +
    +ENV HADOOP_HOME /usr/local/hadoop
    +ENV HADOOP_COMMON_HOME /usr/local/hadoop
    +ENV HADOOP_HDFS_HOME /usr/local/hadoop
    +ENV HADOOP_MAPRED_HOME /usr/local/hadoop
    +ENV HADOOP_YARN_HOME /usr/local/hadoop
    +ENV HADOOP_CONF_DIR /usr/local/hadoop/etc/hadoop
    +ENV YARN_CONF_DIR /usr/local/hadoop/etc/hadoop
    +ENV HADOOP_LOG_DIR /var/log/hadoop
    +ENV HADOOP_BIN_HOME $HADOOP_HOME/bin
    +ENV PATH $PATH:$HADOOP_BIN_HOME
    +
    +ENV KRB_REALM EXAMPLE.COM
    +ENV DOMAIN_REALM example.com
    +ENV KERBEROS_ADMIN admin/admin
    +ENV KERBEROS_ADMIN_PASSWORD admin
    +ENV KEYTAB_DIR /etc/security/keytabs
    +
    +RUN mkdir /var/log/hadoop
    +
    +ADD config/core-site.xml $HADOOP_HOME/etc/hadoop/core-site.xml
    +ADD config/hdfs-site.xml $HADOOP_HOME/etc/hadoop/hdfs-site.xml
    +ADD config/mapred-site.xml $HADOOP_HOME/etc/hadoop/mapred-site.xml
    +ADD config/yarn-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml
    +ADD config/container-executor.cfg $HADOOP_HOME/etc/hadoop/container-executor.cfg
    +RUN chmod 400 $HADOOP_HOME/etc/hadoop/container-executor.cfg
    +RUN chown root:yarn $HADOOP_HOME/etc/hadoop/container-executor.cfg
    +# ADD config/log4j.properties $HADOOP_HOME/etc/hadoop/log4j.properties
    +ADD config/krb5.conf /etc/krb5.conf
    +ADD config/ssl-server.xml $HADOOP_HOME/etc/hadoop/ssl-server.xml
    +ADD config/ssl-client.xml $HADOOP_HOME/etc/hadoop/ssl-client.xml
    +ADD config/keystore.jks $HADOOP_HOME/lib/keystore.jks
    +
    +ADD config/ssh_config /root/.ssh/config
    +RUN chmod 600 /root/.ssh/config
    +RUN chown root:root /root/.ssh/config
    +
    +# workingaround docker.io build error
    +RUN ls -la /usr/local/hadoop/etc/hadoop/*-env.sh
    +RUN chmod +x /usr/local/hadoop/etc/hadoop/*-env.sh
    +RUN ls -la /usr/local/hadoop/etc/hadoop/*-env.sh
    +
    +# fix the 254 error code
    +RUN sed  -i "/^[^#]*UsePAM/ s/.*/#&/"  /etc/ssh/sshd_config
    +RUN echo "UsePAM no" >> /etc/ssh/sshd_config
    +RUN echo "Port 2122" >> /etc/ssh/sshd_config
    +
    +RUN service ssh start
    --- End diff --
    
    I don't know myself why this was in there 😅 was in the image I based this on
    
    removing


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by dawidwys <gi...@git.apache.org>.
Github user dawidwys commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r203982453
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml ---
    @@ -0,0 +1,87 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +version: '3.5'
    +
    +networks:
    +  docker-hadoop-cluster-network:
    --- End diff --
    
    Do we need bridged network?


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by dawidwys <gi...@git.apache.org>.
Github user dawidwys commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204057939
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/docker-compose.yml ---
    @@ -0,0 +1,87 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +version: '3.5'
    +
    +networks:
    +  docker-hadoop-cluster-network:
    +    driver: bridge
    +    name: docker-hadoop-cluster-network
    +
    +services:
    +  kdc:
    +    container_name: "kdc"
    +    hostname: kdc.kerberos.com
    +    image: sequenceiq/kerberos
    +    networks:
    +      - docker-hadoop-cluster-network
    +    environment:
    +      REALM: EXAMPLE.COM
    +      DOMAIN_REALM: kdc.kerberos.com
    +
    +  master:
    +    image: ${DOCKER_HADOOP_IMAGE_NAME:-flink/docker-hadoop-secure-cluster:latest}
    +    command: master
    +    depends_on:
    +      - kdc
    +    ports:
    +      - "50070:50070"
    --- End diff --
    
    I think we do not need to expose ports to the host. We run the flink job from within container anyway.


---

[GitHub] flink issue #6377: [FLINK-8981] Add end-to-end test for running on YARN with...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the issue:

    https://github.com/apache/flink/pull/6377
  
    I also ran the new version on `flink-ci`: https://travis-ci.org/aljoscha/flink-ci/builds/406269018


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204312035
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
    @@ -0,0 +1,159 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +#
    +# This image is modified version of Knappek/docker-hadoop-secure
    +#   * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +#
    +# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    +#   * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +#
    +# Author: Aljoscha Krettek
    +# Date:   2018 May, 15
    +#
    +# Creates multi-node, kerberized Hadoop cluster on Docker
    +
    +FROM sequenceiq/pam:ubuntu-14.04
    +MAINTAINER aljoscha
    +
    +USER root
    +
    +RUN addgroup hadoop
    +RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
    +RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
    +RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
    +
    +RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
    +
    +# install dev tools
    +RUN apt-get update
    --- End diff --
    
    fixing


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r203969501
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/README.md ---
    @@ -0,0 +1,118 @@
    +# Apache Hadoop Docker image with Kerberos enabled
    +
    +This image is modified version of Knappek/docker-hadoop-secure
    + * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +
    +With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    + * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +
    +And a lot of added stuff for making this an actual, properly configured, kerberized cluster with proper user/permissions structure.
    +
    +Versions
    +--------
    +
    +* JDK8
    +* Hadoop 2.8.3
    +
    +Default Environment Variables
    +-----------------------------
    +
    +| Name | Value | Description |
    +| ---- | ----  | ---- |
    +| `KRB_REALM` | `EXAMPLE.COM` | The Kerberos Realm, more information [here](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#) |
    +| `DOMAIN_REALM` | `example.com` | The Kerberos Domain Realm, more information [here](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#) |
    +| `KERBEROS_ADMIN` | `admin/admin` | The KDC admin user |
    +| `KERBEROS_ADMIN_PASSWORD` | `admin` | The KDC admin password |
    +
    +You can simply define these variables in the `docker-compose.yml`.
    +
    +Run image
    +---------
    +
    +Clone the [Github project](https://github.com/aljoscha/docker-hadoop-secure-cluster) and run
    --- End diff --
    
    point to apache repo instead


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r203998139
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
    @@ -0,0 +1,159 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +#
    +# This image is modified version of Knappek/docker-hadoop-secure
    +#   * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +#
    +# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    +#   * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +#
    +# Author: Aljoscha Krettek
    +# Date:   2018 May, 15
    +#
    +# Creates multi-node, kerberized Hadoop cluster on Docker
    +
    +FROM sequenceiq/pam:ubuntu-14.04
    +MAINTAINER aljoscha
    +
    +USER root
    +
    +RUN addgroup hadoop
    +RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
    +RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
    +RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
    +
    +RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
    +
    +# install dev tools
    +RUN apt-get update
    +RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync unzip
    +
    +# Kerberos client
    +RUN apt-get install krb5-user -y
    +RUN mkdir -p /var/log/kerberos
    +RUN touch /var/log/kerberos/kadmind.log
    +
    +# passwordless ssh
    +RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key /root/.ssh/id_rsa
    +RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
    +RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
    +
    +# java
    +RUN mkdir -p /usr/java/default && \
    +     curl -Ls 'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz' -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
    +     tar --strip-components=1 -xz -C /usr/java/default/
    +
    +ENV JAVA_HOME /usr/java/default
    +ENV PATH $PATH:$JAVA_HOME/bin
    +
    +RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
    +RUN unzip jce_policy-8.zip
    +RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar /UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
    +
    +ENV HADOOP_VERSION=2.8.4
    --- End diff --
    
    I agree, but for now we still have to ensure that the hadoop version in flink-dist matches, no?


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by dawidwys <gi...@git.apache.org>.
Github user dawidwys commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204020749
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/README.md ---
    @@ -0,0 +1,118 @@
    +# Apache Hadoop Docker image with Kerberos enabled
    +
    +This image is modified version of Knappek/docker-hadoop-secure
    + * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +
    +With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    + * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +
    +And a lot of added stuff for making this an actual, properly configured, kerberized cluster with proper user/permissions structure.
    +
    +Versions
    +--------
    +
    +* JDK8
    +* Hadoop 2.8.3
    +
    +Default Environment Variables
    +-----------------------------
    +
    +| Name | Value | Description |
    +| ---- | ----  | ---- |
    +| `KRB_REALM` | `EXAMPLE.COM` | The Kerberos Realm, more information [here](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#) |
    +| `DOMAIN_REALM` | `example.com` | The Kerberos Domain Realm, more information [here](https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#) |
    +| `KERBEROS_ADMIN` | `admin/admin` | The KDC admin user |
    +| `KERBEROS_ADMIN_PASSWORD` | `admin` | The KDC admin password |
    +
    +You can simply define these variables in the `docker-compose.yml`.
    +
    +Run image
    +---------
    +
    +Clone the [Github project](https://github.com/aljoscha/docker-hadoop-secure-cluster) and run
    +
    +```
    +docker-compose up
    +```
    +
    +Usage
    +-----
    +
    +Get the container name with `docker ps` and login to the container with
    +
    +```
    +docker exec -it <container-name> /bin/bash
    +```
    +
    +
    +To obtain a Kerberos ticket, execute
    +
    +```
    +kinit -kt /home/hadoop-user/hadoop-user.keytab hadoop-user
    +```
    +
    +Afterwards you can use `hdfs` CLI like
    +
    +```
    +hdfs dfs -ls /
    +```
    +
    +
    +Known issues
    +------------
    +
    +### Unable to obtain Kerberos password
    +
    +#### Error
    +docker-compose up fails for the first time with the error
    +
    +```
    +Login failure for nn/hadoop.docker.com@EXAMPLE.COM from keytab /etc/security/keytabs/nn.service.keytab: javax.security.auth.login.LoginException: Unable to obtain password from user
    +```
    +
    +#### Solution
    +
    +Stop the containers with `docker-compose down` and start again with `docker-compose up -d`.
    +
    +
    +### JDK 8
    +
    +Make sure you use download a JDK version that is still available. Old versions can be deprecated by Oracle and thus the download link won't be able anymore.
    +
    +Get the latest JDK8 Download URL with
    +
    +```
    +curl -s https://lv.binarybabel.org/catalog-api/java/jdk8.json
    +```
    +
    +### Java Keystore
    +
    +If the Keystroe has been expired, then create a new `keystore.jks`:
    --- End diff --
    
    Keystroe -> Keystore
    
    Won't it be a problem in tests? Will the test start failing one day because of the keystore expired?


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204327419
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/bootstrap.sh ---
    @@ -0,0 +1,121 @@
    +#!/bin/bash
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +
    +: ${HADOOP_PREFIX:=/usr/local/hadoop}
    +
    +$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
    +
    +rm /tmp/*.pid
    +
    +# installing libraries if any - (resource urls added comma separated to the ACP system variable)
    +cd $HADOOP_PREFIX/share/hadoop/common ; for cp in ${ACP//,/ }; do  echo == $cp; curl -LO $cp ; done; cd -
    +
    +# kerberos client
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" /etc/krb5.conf
    +sed -i "s/example.com/${DOMAIN_REALM}/g" /etc/krb5.conf
    +
    +# update config files
    +sed -i "s/HOSTNAME/$(hostname -f)/g" $HADOOP_PREFIX/etc/hadoop/core-site.xml
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" $HADOOP_PREFIX/etc/hadoop/core-site.xml
    +sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" $HADOOP_PREFIX/etc/hadoop/core-site.xml
    +
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" $HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
    +sed -i "s/HOSTNAME/$(hostname -f)/g" $HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
    +sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" $HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
    +
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" $HADOOP_PREFIX/etc/hadoop/yarn-site.xml
    +sed -i "s/HOSTNAME/$(hostname -f)/g" $HADOOP_PREFIX/etc/hadoop/yarn-site.xml
    +sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" $HADOOP_PREFIX/etc/hadoop/yarn-site.xml
    +
    +sed -i "s/EXAMPLE.COM/${KRB_REALM}/g" $HADOOP_PREFIX/etc/hadoop/mapred-site.xml
    +sed -i "s/HOSTNAME/$(hostname -f)/g" $HADOOP_PREFIX/etc/hadoop/mapred-site.xml
    +sed -i "s#/etc/security/keytabs#${KEYTAB_DIR}#g" $HADOOP_PREFIX/etc/hadoop/mapred-site.xml
    +
    +sed -i "s#/usr/local/hadoop/bin/container-executor#${NM_CONTAINER_EXECUTOR_PATH}#g" $HADOOP_PREFIX/etc/hadoop/yarn-site.xml
    +
    +# create namenode kerberos principal and keytab
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc -randkey hdfs/$(hostname -f)@${KRB_REALM}"
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc -randkey mapred/$(hostname -f)@${KRB_REALM}"
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc -randkey yarn/$(hostname -f)@${KRB_REALM}"
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc -randkey HTTP/$(hostname -f)@${KRB_REALM}"
    +
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k hdfs.keytab hdfs/$(hostname -f) HTTP/$(hostname -f)"
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k mapred.keytab mapred/$(hostname -f) HTTP/$(hostname -f)"
    +kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k yarn.keytab yarn/$(hostname -f) HTTP/$(hostname -f)"
    +
    +mkdir -p ${KEYTAB_DIR}
    +mv hdfs.keytab ${KEYTAB_DIR}
    +mv mapred.keytab ${KEYTAB_DIR}
    +mv yarn.keytab ${KEYTAB_DIR}
    +chmod 400 ${KEYTAB_DIR}/hdfs.keytab
    +chmod 400 ${KEYTAB_DIR}/mapred.keytab
    +chmod 400 ${KEYTAB_DIR}/yarn.keytab
    +chown hdfs:hadoop ${KEYTAB_DIR}/hdfs.keytab
    +chown mapred:hadoop ${KEYTAB_DIR}/mapred.keytab
    +chown yarn:hadoop ${KEYTAB_DIR}/yarn.keytab
    +
    +service ssh start
    +
    +if [ "$1" == "--help" -o "$1" == "-h" ]; then
    +    echo "Usage: $(basename $0) (master|worker)"
    +    exit 0
    +elif [ "$1" == "master" ]; then
    +    yes| sudo -E -u hdfs $HADOOP_PREFIX/bin/hdfs namenode -format
    +
    +    nohup sudo -E -u hdfs $HADOOP_PREFIX/bin/hdfs namenode 2>> /var/log/hadoop/namenode.err >> /var/log/hadoop/namenode.out &
    +    nohup sudo -E -u yarn $HADOOP_PREFIX/bin/yarn resourcemanager 2>> /var/log/hadoop/resourcemanager.err >> /var/log/hadoop/resourcemanager.out &
    +    nohup sudo -E -u yarn $HADOOP_PREFIX/bin/yarn timelineserver 2>> /var/log/hadoop/timelineserver.err >> /var/log/hadoop/timelineserver.out &
    +    nohup sudo -E -u mapred $HADOOP_PREFIX/bin/mapred historyserver 2>> /var/log/hadoop/historyserver.err >> /var/log/hadoop/historyserver.out &
    +
    +
    +    kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc -randkey root@${KRB_REALM}"
    +    kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k /root/root.keytab root"
    +
    +    kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "addprinc -pw hadoop-user hadoop-user@${KRB_REALM}"
    +    kadmin -p ${KERBEROS_ADMIN} -w ${KERBEROS_ADMIN_PASSWORD} -q "xst -k /home/hadoop-user/hadoop-user.keytab hadoop-user"
    +    chown hadoop-user:hadoop-user /home/hadoop-user/hadoop-user
    +
    +    kinit -kt /root/root.keytab root
    +
    +    hdfs dfsadmin -safemode wait
    +    while [ $? -ne 0 ]; do hdfs dfsadmin -safemode wait; done
    +
    +    hdfs dfs -chown hdfs:hadoop /
    +    hdfs dfs -chmod 755 /
    +    hdfs dfs -mkdir /tmp
    +    hdfs dfs -chown hdfs:hadoop /tmp
    +    hdfs dfs -chmod -R 1777 /tmp
    +    hdfs dfs -mkdir /tmp/logs
    +    hdfs dfs -chown yarn:hadoop /tmp/logs
    +    hdfs dfs -chmod 1777 /tmp/logs
    +
    +    hdfs dfs -mkdir -p /user/hadoop-user
    +    hdfs dfs -chown hadoop-user:hadoop-user /user/hadoop-user
    +
    +    kdestroy
    +
    +    while true; do sleep 1000; done
    +elif [ "$1" == "worker" ]; then
    +    nohup sudo -E -u hdfs $HADOOP_PREFIX/bin/hdfs datanode 2>> /var/log/hadoop/datanode.err >> /var/log/hadoop/datanode.out &
    +    nohup sudo -E -u yarn $HADOOP_PREFIX/bin/yarn nodemanager 2>> /var/log/hadoop/nodemanager.err >> /var/log/hadoop/nodemanager.out &
    +    while true; do sleep 1000; done
    +elif [ $1 == "bash" ]; then
    --- End diff --
    
    removing, this was because earlier the setup was meant for more generic/general use cases


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by dawidwys <gi...@git.apache.org>.
Github user dawidwys commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204018916
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
    @@ -0,0 +1,159 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +#
    +# This image is modified version of Knappek/docker-hadoop-secure
    +#   * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +#
    +# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    +#   * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +#
    +# Author: Aljoscha Krettek
    +# Date:   2018 May, 15
    +#
    +# Creates multi-node, kerberized Hadoop cluster on Docker
    +
    +FROM sequenceiq/pam:ubuntu-14.04
    +MAINTAINER aljoscha
    +
    +USER root
    +
    +RUN addgroup hadoop
    +RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
    +RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
    +RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
    +
    +RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
    +
    +# install dev tools
    +RUN apt-get update
    +RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync unzip
    +
    +# Kerberos client
    +RUN apt-get install krb5-user -y
    +RUN mkdir -p /var/log/kerberos
    +RUN touch /var/log/kerberos/kadmind.log
    +
    +# passwordless ssh
    +RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key /root/.ssh/id_rsa
    +RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
    +RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
    +
    +# java
    +RUN mkdir -p /usr/java/default && \
    +     curl -Ls 'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz' -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
    +     tar --strip-components=1 -xz -C /usr/java/default/
    +
    +ENV JAVA_HOME /usr/java/default
    +ENV PATH $PATH:$JAVA_HOME/bin
    +
    +RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
    +RUN unzip jce_policy-8.zip
    +RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar /UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
    +
    +ENV HADOOP_VERSION=2.8.4
    +
    +# ENV HADOOP_URL https://www.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
    +ENV HADOOP_URL http://archive.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
    +RUN set -x \
    +    && curl -fSL "$HADOOP_URL" -o /tmp/hadoop.tar.gz \
    +    && tar -xf /tmp/hadoop.tar.gz -C /usr/local/ \
    +    && rm /tmp/hadoop.tar.gz*
    +
    +WORKDIR /usr/local
    +RUN ln -s /usr/local/hadoop-${HADOOP_VERSION} /usr/local/hadoop
    +RUN chown root:root -R /usr/local/hadoop-${HADOOP_VERSION}/
    +RUN chown root:root -R /usr/local/hadoop/
    +RUN chown root:yarn /usr/local/hadoop/bin/container-executor
    +RUN chmod 6050 /usr/local/hadoop/bin/container-executor
    +RUN mkdir -p /hadoop-data/nm-local-dirs
    +RUN mkdir -p /hadoop-data/nm-log-dirs
    +RUN chown yarn:yarn /hadoop-data
    +RUN chown yarn:yarn /hadoop-data/nm-local-dirs
    +RUN chown yarn:yarn /hadoop-data/nm-log-dirs
    +RUN chmod 755 /hadoop-data
    +RUN chmod 755 /hadoop-data/nm-local-dirs
    +RUN chmod 755 /hadoop-data/nm-log-dirs
    +
    +
    +ENV HADOOP_HOME /usr/local/hadoop
    +ENV HADOOP_COMMON_HOME /usr/local/hadoop
    +ENV HADOOP_HDFS_HOME /usr/local/hadoop
    +ENV HADOOP_MAPRED_HOME /usr/local/hadoop
    +ENV HADOOP_YARN_HOME /usr/local/hadoop
    +ENV HADOOP_CONF_DIR /usr/local/hadoop/etc/hadoop
    +ENV YARN_CONF_DIR /usr/local/hadoop/etc/hadoop
    +ENV HADOOP_LOG_DIR /var/log/hadoop
    +ENV HADOOP_BIN_HOME $HADOOP_HOME/bin
    +ENV PATH $PATH:$HADOOP_BIN_HOME
    +
    +ENV KRB_REALM EXAMPLE.COM
    +ENV DOMAIN_REALM example.com
    +ENV KERBEROS_ADMIN admin/admin
    +ENV KERBEROS_ADMIN_PASSWORD admin
    +ENV KEYTAB_DIR /etc/security/keytabs
    +
    +RUN mkdir /var/log/hadoop
    +
    +ADD config/core-site.xml $HADOOP_HOME/etc/hadoop/core-site.xml
    +ADD config/hdfs-site.xml $HADOOP_HOME/etc/hadoop/hdfs-site.xml
    +ADD config/mapred-site.xml $HADOOP_HOME/etc/hadoop/mapred-site.xml
    +ADD config/yarn-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml
    +ADD config/container-executor.cfg $HADOOP_HOME/etc/hadoop/container-executor.cfg
    +RUN chmod 400 $HADOOP_HOME/etc/hadoop/container-executor.cfg
    +RUN chown root:yarn $HADOOP_HOME/etc/hadoop/container-executor.cfg
    +# ADD config/log4j.properties $HADOOP_HOME/etc/hadoop/log4j.properties
    +ADD config/krb5.conf /etc/krb5.conf
    +ADD config/ssl-server.xml $HADOOP_HOME/etc/hadoop/ssl-server.xml
    +ADD config/ssl-client.xml $HADOOP_HOME/etc/hadoop/ssl-client.xml
    +ADD config/keystore.jks $HADOOP_HOME/lib/keystore.jks
    +
    +ADD config/ssh_config /root/.ssh/config
    +RUN chmod 600 /root/.ssh/config
    +RUN chown root:root /root/.ssh/config
    +
    +# workingaround docker.io build error
    +RUN ls -la /usr/local/hadoop/etc/hadoop/*-env.sh
    +RUN chmod +x /usr/local/hadoop/etc/hadoop/*-env.sh
    +RUN ls -la /usr/local/hadoop/etc/hadoop/*-env.sh
    +
    +# fix the 254 error code
    +RUN sed  -i "/^[^#]*UsePAM/ s/.*/#&/"  /etc/ssh/sshd_config
    +RUN echo "UsePAM no" >> /etc/ssh/sshd_config
    +RUN echo "Port 2122" >> /etc/ssh/sshd_config
    +
    +RUN service ssh start
    --- End diff --
    
    I think it does nothing. Docker does not preserve processes that were run during build.


---

[GitHub] flink pull request #6377: [FLINK-8981] Add end-to-end test for running on YA...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6377#discussion_r204305783
  
    --- Diff: flink-end-to-end-tests/test-scripts/docker-hadoop-secure-cluster/Dockerfile ---
    @@ -0,0 +1,159 @@
    +################################################################################
    +# Licensed to the Apache Software Foundation (ASF) under one
    +# or more contributor license agreements.  See the NOTICE file
    +# distributed with this work for additional information
    +# regarding copyright ownership.  The ASF licenses this file
    +# to you under the Apache License, Version 2.0 (the
    +# "License"); you may not use this file except in compliance
    +# with the License.  You may obtain a copy of the License at
    +#
    +#     http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +################################################################################
    +#
    +# This image is modified version of Knappek/docker-hadoop-secure
    +#   * Knappek/docker-hadoop-secure <https://github.com/Knappek/docker-hadoop-secure>
    +#
    +# With bits and pieces added from Lewuathe/docker-hadoop-cluster to extend it to start a proper kerberized Hadoop cluster:
    +#   * Lewuathe/docker-hadoop-cluster <https://github.com/Lewuathe/docker-hadoop-cluster>
    +#
    +# Author: Aljoscha Krettek
    +# Date:   2018 May, 15
    +#
    +# Creates multi-node, kerberized Hadoop cluster on Docker
    +
    +FROM sequenceiq/pam:ubuntu-14.04
    +MAINTAINER aljoscha
    +
    +USER root
    +
    +RUN addgroup hadoop
    +RUN useradd -d /home/hdfs -ms /bin/bash -G hadoop -p hdfs hdfs
    +RUN useradd -d /home/yarn -ms /bin/bash -G hadoop -p yarn yarn
    +RUN useradd -d /home/mapred -ms /bin/bash -G hadoop -p mapred mapred
    +
    +RUN useradd -d /home/hadoop-user -ms /bin/bash -p hadoop-user hadoop-user
    +
    +# install dev tools
    +RUN apt-get update
    +RUN apt-get install -y curl tar sudo openssh-server openssh-client rsync unzip
    +
    +# Kerberos client
    +RUN apt-get install krb5-user -y
    +RUN mkdir -p /var/log/kerberos
    +RUN touch /var/log/kerberos/kadmind.log
    +
    +# passwordless ssh
    +RUN rm -f /etc/ssh/ssh_host_dsa_key /etc/ssh/ssh_host_rsa_key /root/.ssh/id_rsa
    +RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
    +RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
    +RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
    +
    +# java
    +RUN mkdir -p /usr/java/default && \
    +     curl -Ls 'http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz' -H 'Cookie: oraclelicense=accept-securebackup-cookie' | \
    +     tar --strip-components=1 -xz -C /usr/java/default/
    +
    +ENV JAVA_HOME /usr/java/default
    +ENV PATH $PATH:$JAVA_HOME/bin
    +
    +RUN curl -LOH 'Cookie: oraclelicense=accept-securebackup-cookie' 'http://download.oracle.com/otn-pub/java/jce/8/jce_policy-8.zip'
    +RUN unzip jce_policy-8.zip
    +RUN cp /UnlimitedJCEPolicyJDK8/local_policy.jar /UnlimitedJCEPolicyJDK8/US_export_policy.jar $JAVA_HOME/jre/lib/security
    +
    +ENV HADOOP_VERSION=2.8.4
    --- End diff --
    
    and I'm running the nightly tests using the `withoutHadoop` profile


---