You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by al...@apache.org on 2019/08/06 15:12:50 UTC

[flink] branch release-1.8 updated (a76b9e9 -> 954f3c0)

This is an automated email from the ASF dual-hosted git repository.

aljoscha pushed a change to branch release-1.8
in repository https://gitbox.apache.org/repos/asf/flink.git.


    from a76b9e9  [FLINK-13394][travis] Use fallback unsafe MapR repository
     new 9441505  [FLINK-10368] Harden Dockerized Kerberos tests by waiting for NM to be up
     new 9881c45  [hotfix] Print Flink logs from YARN in test_yarn_kerberos_docker.sh
     new 954f3c0  [FLINK-10368] Increase slot request timeout to harden YARN/Kerberos test

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../test-scripts/test_yarn_kerberos_docker.sh      | 31 ++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)


[flink] 02/03: [hotfix] Print Flink logs from YARN in test_yarn_kerberos_docker.sh

Posted by al...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

aljoscha pushed a commit to branch release-1.8
in repository https://gitbox.apache.org/repos/asf/flink.git

commit 9881c45ea271176d19457b0e2aa98d1b4975f860
Author: Aljoscha Krettek <al...@apache.org>
AuthorDate: Fri Aug 2 14:48:24 2019 +0200

    [hotfix] Print Flink logs from YARN in test_yarn_kerberos_docker.sh
---
 flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh b/flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh
index 528dfed..8f7d676 100755
--- a/flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh
+++ b/flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh
@@ -172,6 +172,13 @@ else
     echo "Docker logs:"
     docker logs master
     exit 1
+
+    echo "Flink logs:"
+    docker exec -it master bash -c "kinit -kt /home/hadoop-user/hadoop-user.keytab hadoop-user"
+    application_id=`docker exec -it master bash -c "yarn application -list -appStates ALL" | grep "Flink session cluster" | awk '{print \$1}'`
+    echo "Application ID: $application_id"
+    docker exec -it master bash -c "yarn logs -applicationId $application_id"
+    docker exec -it master bash -c "kdestroy"
 fi
 
 if [[ ! "$OUTPUT" =~ "consummation,1" ]]; then


[flink] 03/03: [FLINK-10368] Increase slot request timeout to harden YARN/Kerberos test

Posted by al...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

aljoscha pushed a commit to branch release-1.8
in repository https://gitbox.apache.org/repos/asf/flink.git

commit 954f3c0fb33185c34fca485ccf47a2d0de587d72
Author: Aljoscha Krettek <al...@apache.org>
AuthorDate: Mon Aug 5 10:15:34 2019 +0200

    [FLINK-10368] Increase slot request timeout to harden YARN/Kerberos test
    
    Before, the tests were sometimes failing with
    NoResourceAvailableException. In the logs it was visible that the
    requested TaskExecutors (TMs) were connecting after the exception was
    thrown. Increasing the timeout therefore fixes the instability.
---
 flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh b/flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh
index 8f7d676..f142a37 100755
--- a/flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh
+++ b/flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh
@@ -138,7 +138,7 @@ docker exec -it master bash -c "tar xzf /home/hadoop-user/$FLINK_TARBALL --direc
 # minimal Flink config, bebe
 docker exec -it master bash -c "echo \"security.kerberos.login.keytab: /home/hadoop-user/hadoop-user.keytab\" > /home/hadoop-user/$FLINK_DIRNAME/conf/flink-conf.yaml"
 docker exec -it master bash -c "echo \"security.kerberos.login.principal: hadoop-user\" >> /home/hadoop-user/$FLINK_DIRNAME/conf/flink-conf.yaml"
-docker exec -it master bash -c "echo \"slot.request.timeout: 60000\" >> /home/hadoop-user/$FLINK_DIRNAME/conf/flink-conf.yaml"
+docker exec -it master bash -c "echo \"slot.request.timeout: 120000\" >> /home/hadoop-user/$FLINK_DIRNAME/conf/flink-conf.yaml"
 docker exec -it master bash -c "echo \"containerized.heap-cutoff-min: 100\" >> /home/hadoop-user/$FLINK_DIRNAME/conf/flink-conf.yaml"
 
 echo "Flink config:"


[flink] 01/03: [FLINK-10368] Harden Dockerized Kerberos tests by waiting for NM to be up

Posted by al...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

aljoscha pushed a commit to branch release-1.8
in repository https://gitbox.apache.org/repos/asf/flink.git

commit 94415058a3e71ff53b7d3985fa038fa1c4e4aefa
Author: Aljoscha Krettek <al...@apache.org>
AuthorDate: Thu Aug 1 13:04:24 2019 +0200

    [FLINK-10368] Harden Dockerized Kerberos tests by waiting for NM to be up
    
    Before, we didn't wait for Yarn NodeManagers to be up. This meant that
    sometimes the Flink Job would not have enough resources to run.
---
 .../test-scripts/test_yarn_kerberos_docker.sh      | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh b/flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh
index 5f2dea2..528dfed 100755
--- a/flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh
+++ b/flink-end-to-end-tests/test-scripts/test_yarn_kerberos_docker.sh
@@ -61,7 +61,7 @@ function start_hadoop_cluster() {
             return 1
         else
             echo "Waiting for hadoop cluster to come up. We have been trying for $time_diff seconds, retrying ..."
-            sleep 10
+            sleep 5
         fi
     done
 
@@ -74,6 +74,26 @@ function start_hadoop_cluster() {
         return 1
     fi
 
+    # try and see if NodeManagers are up, otherwise the Flink job will not have enough resources
+    # to run
+    nm_running="0"
+    start_time=$(date +%s)
+    while [ "$nm_running" -lt "2" ]; do
+        current_time=$(date +%s)
+        time_diff=$((current_time - start_time))
+
+        if [ $time_diff -ge $MAX_RETRY_SECONDS ]; then
+            return 1
+        else
+            echo "We only have $nm_running NodeManagers up. We have been trying for $time_diff seconds, retrying ..."
+            sleep 1
+        fi
+
+        docker exec -it master bash -c "kinit -kt /home/hadoop-user/hadoop-user.keytab hadoop-user"
+        nm_running=`docker exec -it master bash -c "yarn node -list" | grep RUNNING | wc -l`
+        docker exec -it master bash -c "kdestroy"
+    done
+
     return 0
 }