You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/02/26 10:22:33 UTC

[GitHub] [flink] rmetzger opened a new pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

rmetzger opened a new pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222
 
 
   ## What is the purpose of the change
   
   This change set up a nightly end to end test, testing special scenarios such as Java 11, hadoop 2.8 or scala 2.12.
   
   
   ## Brief change log
   
   - Change the `build-apache-repo.yaml` to trigger the tests nightly (I left out release branches for now, as only master has the azure files present)
   - change the `job-templates.yml` to build the end to end tests with every commit (pr, push builds). This is an experiment, as we might not have enough resources (but I believe we do).
   I had two options for this: either transfer the build artifact from the compile phase into the e2e test job, or build there from scratch. I decided for the latter, as the build always takes 20 minutes. So in case the test machines are busy, but the azure provided machines are available, we'll start compiling right away.
   The precommit tests are executed in this stage, before the end to end tests.
   
   This change also fixes many issues with the end to end tests:
    
   **Fix Kubernetes E2E tests**
   
   Problem: Low disk space was causing K8s to mark the kubelet as "full disk", thus Flink did not schedule there.
   Problem: Low disk space let the kublet delete unused docker images, including images generated for the test.
   
   **Fix Kerberized YARN e2e test**
   
   The problem was that YARN was decommissioning NodeManagers because of low disk space.
   
   **Fix queryable state e2e test**
   
   Problem: The logging pattern recently changed, that's why the extaction of port / ip failed
   
   
   ## Verifying this change
   
   This nightly build failed only because of known issues (which also caused the travis nightly test to fail). I believe the structure / environment of the individual nightly tests is fine: https://dev.azure.com/rmetzger/Flink/_build/results?buildId=5590&view=results
   
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / **no**)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
     - The serializers: (yes / **no** / don't know)
     - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
     - The S3 file system connector: (yes / **no** / don't know)
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
flinkbot commented on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#issuecomment-591357656
 
 
   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "baea1576fa2ca73f42fb3460e4ee11c073eddd6e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "baea1576fa2ca73f42fb3460e4ee11c073eddd6e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * baea1576fa2ca73f42fb3460e4ee11c073eddd6e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384465359
 
 

 ##########
 File path: tools/azure-pipelines/build-apache-repo.yml
 ##########
 @@ -48,3 +58,60 @@ stages:
           e2e_pool_definition:
             vmImage: 'ubuntu-16.04'
           environment: PROFILE="-Dhadoop.version=2.8.3 -Dinclude_hadoop_aws -Dscala-2.11"
+          run_end_to_end: false
+          container: flink-build-jdk8
+  # Special stage for nightly builds:
+  - stage: cron_build
+    displayName: "Cron build"
+    dependsOn: [] # depending on an empty array makes the stages run in parallel
+    condition: or(eq(variables['Build.Reason'], 'Schedule'), eq(variables['MODE'], 'nightly'))
+    jobs:
+      - template: jobs-template.yml
+        parameters:
+          stage_name: cron_build_hadoop241
+          test_pool_definition:
+            name: Default
+          e2e_pool_definition:
+            vmImage: 'ubuntu-16.04'
+          environment: PROFILE="-Dhadoop.version=2.4.1 -Pskip-hive-tests"
+          run_end_to_end: true
+          container: flink-build-jdk8
+      - template: jobs-template.yml
+        parameters:
+          stage_name: cron_build_scala2_12
+          test_pool_definition:
+            name: Default
+          e2e_pool_definition:
+            vmImage: 'ubuntu-16.04'
+          environment: PROFILE="-Dhadoop.version=2.8.3 -Dinclude_hadoop_aws -Dscala-2.12 -Phive-1.2.1"
+          run_end_to_end: true
+          container: flink-build-jdk8
+      - template: jobs-template.yml
+        parameters:
+          stage_name: cron_build_jdk11
+          test_pool_definition:
+            name: Default
+          e2e_pool_definition:
+            vmImage: 'ubuntu-16.04'
+          environment: PROFILE="-Dhadoop.version=2.8.3 -Dinclude_hadoop_aws -Dscala-2.11 -Djdk11"
+          run_end_to_end: true
+          container: flink-build-jdk11
+      - template: jobs-template.yml
+        parameters:
+          stage_name: cron_build_hadoopfree
+          test_pool_definition:
+            name: Default
+          e2e_pool_definition:
+            vmImage: 'ubuntu-16.04'
+          environment: PROFILE=""
+          run_end_to_end: true
+          container: flink-build-jdk8
+      - job: docs_404_check # run on a MSFT provided machine
 
 Review comment:
   it's not a matter of build times but of stability and resource conservation.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384407405
 
 

 ##########
 File path: tools/azure-pipelines/prepare_precommit.sh
 ##########
 @@ -32,8 +32,17 @@ find . -type f -name '*.timestamp' | xargs touch
 export M2_HOME=/home/vsts/maven_cache/apache-maven-3.2.5/ 
 export PATH=/home/vsts/maven_cache/apache-maven-3.2.5/bin:$PATH
 mvn -version
-mvn install --settings ./tools/azure-pipelines/google-mirror-settings.xml -DskipTests -Drat.skip
+MVN_CALL="mvn install --settings ./tools/azure-pipelines/google-mirror-settings.xml -DskipTests -Drat.skip $PROFILE"
+echo "Invoking Maven: '$MVN_CALL'"
+$MVN_CALL
+EXIT_CODE=$?
 
+if [ $EXIT_CODE != 0 ]; then
+	echo "=============================================================================="
+	echo "Build error. Exit code: $EXIT_CODE. Failing build"
+	echo "=============================================================================="
+	exit $EXIT_CODE
+fi
 
 chmod -R +x build-target
 chmod -R +x flink-end-to-end-tests
 
 Review comment:
   I left this file in, even though it is currently not used. I consider always running the e2e tests an experiment. If we need to undo it, this file is still around.
   If the experiment is successful, I will remove it (a cleanup PR is needed anyways once we remove travis support).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384454430
 
 

 ##########
 File path: tools/azure-pipelines/jobs-template.yml
 ##########
 @@ -132,20 +134,26 @@ jobs:
         restoreKeys: $(CACHE_FALLBACK_KEY)
         path: $(MAVEN_CACHE_FOLDER)
       displayName: Cache Maven local repo
+      continueOnError: true
     
     # download artifacts
     - task: DownloadPipelineArtifact@2
       inputs:
         path: $(CACHE_FLINK_DIR)
         artifact: FlinkCompileCacheDir-${{parameters.stage_name}}
+    - script: |
+        echo "##vso[task.setvariable variable=JAVA_HOME]$(JAVA_HOME_11_X64)"
 
 Review comment:
   This step is only used on the azure machines (where we do not use the docker image).
   That's why I'm changing the java version like this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#issuecomment-591357656
 
 
   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "baea1576fa2ca73f42fb3460e4ee11c073eddd6e",
       "status" : "PENDING",
       "url" : "https://travis-ci.com/flink-ci/flink/builds/150628235",
       "triggerID" : "baea1576fa2ca73f42fb3460e4ee11c073eddd6e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "baea1576fa2ca73f42fb3460e4ee11c073eddd6e",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5616",
       "triggerID" : "baea1576fa2ca73f42fb3460e4ee11c073eddd6e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * baea1576fa2ca73f42fb3460e4ee11c073eddd6e Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/150628235) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5616) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#issuecomment-591353080
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit 721fdc15d019f8ba8205962beab7665c45ee91e7 (Fri Feb 28 21:48:22 UTC 2020)
   
   **Warnings:**
    * No documentation files were touched! Remember to keep the Flink docs up to date!
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384440943
 
 

 ##########
 File path: flink-end-to-end-tests/test-scripts/common_kubernetes.sh
 ##########
 @@ -51,10 +51,18 @@ function check_kubernetes_status {
 function start_kubernetes_if_not_running {
     if ! check_kubernetes_status; then
         echo "Starting minikube ..."
-        start_command="minikube start"
         # We need sudo permission to set vm-driver to none in linux os.
-        [[ "${OS_TYPE}" = "linux" ]] && start_command="sudo CHANGE_MINIKUBE_NONE_USER=true ${start_command} --vm-driver=none"
-        ${start_command}
+        if [[ "${OS_TYPE}" = "linux" ]] ; then
+            sudo CHANGE_MINIKUBE_NONE_USER=true minikube start --vm-driver=none \
+                --extra-config=kubelet.image-gc-high-threshold=99 \
+                --extra-config=kubelet.image-gc-low-threshold=98 \
+                --extra-config=kubelet.minimum-container-ttl-duration=120m \
+                --extra-config=kubelet.eviction-hard="memory.available<5Mi,nodefs.available<1Mi,imagefs.available<1Mi" \
+                --extra-config=kubelet.eviction-soft="memory.available<5Mi,nodefs.available<2Mi,imagefs.available<2Mi" \
+                --extra-config=kubelet.eviction-soft-grace-period="memory.available=2h,nodefs.available=2h,imagefs.available=2h"
 
 Review comment:
   how do `image-gc-X-threshold` / `ttl-duration` related to disk space problems?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384446670
 
 

 ##########
 File path: tools/azure-pipelines/jobs-template.yml
 ##########
 @@ -159,9 +167,15 @@ jobs:
         restoreKeys: $(CACHE_FALLBACK_KEY)
         path: $(MAVEN_CACHE_FOLDER)
       displayName: Cache Maven local repo
+      continueOnError: true
+    - script: |
+        echo "##vso[task.setvariable variable=JAVA_HOME]$(JAVA_HOME_11_X64)"
+        echo "##vso[task.setvariable variable=PATH]$(JAVA_HOME_11_X64)/bin;$(PATH)"
+      displayName: "Set to jdk11"
+      condition: eq('${{parameters.container}}', 'flink-build-jdk11')
     - script: ./tools/travis/setup_maven.sh
     - script: ./tools/azure-pipelines/setup_kubernetes.sh
-    - script: M2_HOME=/home/vsts/maven_cache/apache-maven-3.2.5/ PATH=/home/vsts/maven_cache/apache-maven-3.2.5/bin:$PATH PROFILE="-Dinclude-hadoop -Dhadoop.version=2.8.3 -De2e-metrics -Dmaven.wagon.http.pool=false" STAGE=compile ./tools/azure_controller.sh compile
+    - script: M2_HOME=/home/vsts/maven_cache/apache-maven-3.2.5/ PATH=/home/vsts/maven_cache/apache-maven-3.2.5/bin:$PATH ${{parameters.environment}} STAGE=compile ./tools/azure_controller.sh compile
       displayName: Build
     - script: FLINK_DIR=`pwd`/build-target flink-end-to-end-tests/run-nightly-tests.sh
 
 Review comment:
   where are you running the java e2e tests?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384523958
 
 

 ##########
 File path: flink-end-to-end-tests/test-scripts/common_kubernetes.sh
 ##########
 @@ -51,10 +51,18 @@ function check_kubernetes_status {
 function start_kubernetes_if_not_running {
     if ! check_kubernetes_status; then
         echo "Starting minikube ..."
-        start_command="minikube start"
         # We need sudo permission to set vm-driver to none in linux os.
-        [[ "${OS_TYPE}" = "linux" ]] && start_command="sudo CHANGE_MINIKUBE_NONE_USER=true ${start_command} --vm-driver=none"
-        ${start_command}
+        if [[ "${OS_TYPE}" = "linux" ]] ; then
+            sudo CHANGE_MINIKUBE_NONE_USER=true minikube start --vm-driver=none \
+                --extra-config=kubelet.image-gc-high-threshold=99 \
+                --extra-config=kubelet.image-gc-low-threshold=98 \
+                --extra-config=kubelet.minimum-container-ttl-duration=120m \
+                --extra-config=kubelet.eviction-hard="memory.available<5Mi,nodefs.available<1Mi,imagefs.available<1Mi" \
+                --extra-config=kubelet.eviction-soft="memory.available<5Mi,nodefs.available<2Mi,imagefs.available<2Mi" \
+                --extra-config=kubelet.eviction-soft-grace-period="memory.available=2h,nodefs.available=2h,imagefs.available=2h"
 
 Review comment:
   Done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384445114
 
 

 ##########
 File path: tools/azure_controller.sh
 ##########
 @@ -172,6 +172,7 @@ elif [ $STAGE != "$STAGE_CLEANUP" ]; then
         PY_MVN="${MVN// clean/}"
         PY_MVN="$PY_MVN -Drat.skip=true"
         ${PY_MVN}
+        EXIT_CODE=$?
 
 Review comment:
   unrelated

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384464925
 
 

 ##########
 File path: flink-end-to-end-tests/test-scripts/common_kubernetes.sh
 ##########
 @@ -51,10 +51,18 @@ function check_kubernetes_status {
 function start_kubernetes_if_not_running {
     if ! check_kubernetes_status; then
         echo "Starting minikube ..."
-        start_command="minikube start"
         # We need sudo permission to set vm-driver to none in linux os.
-        [[ "${OS_TYPE}" = "linux" ]] && start_command="sudo CHANGE_MINIKUBE_NONE_USER=true ${start_command} --vm-driver=none"
-        ${start_command}
+        if [[ "${OS_TYPE}" = "linux" ]] ; then
+            sudo CHANGE_MINIKUBE_NONE_USER=true minikube start --vm-driver=none \
+                --extra-config=kubelet.image-gc-high-threshold=99 \
+                --extra-config=kubelet.image-gc-low-threshold=98 \
+                --extra-config=kubelet.minimum-container-ttl-duration=120m \
+                --extra-config=kubelet.eviction-hard="memory.available<5Mi,nodefs.available<1Mi,imagefs.available<1Mi" \
+                --extra-config=kubelet.eviction-soft="memory.available<5Mi,nodefs.available<2Mi,imagefs.available<2Mi" \
+                --extra-config=kubelet.eviction-soft-grace-period="memory.available=2h,nodefs.available=2h,imagefs.available=2h"
 
 Review comment:
   You say they aren't related to low disk space, but the problem you refer to is read by me as `Low disk space [caused issues].`
   
   I'm gonna need a longer explanation :/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] rmetzger commented on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
rmetzger commented on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#issuecomment-591448725
 
 
   Thanks a lot. I will merge this change then!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] rmetzger commented on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
rmetzger commented on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#issuecomment-591409764
 
 
   Thank you for your review so far!
   I have addressed all comments, except for the 3/4 java end to end tests and the caching of the docs 404 check.
   Would you be okay with merging this PR as is, and I address these two issues in another PR?
   I am not sure if I will manage to implement, test and review those items today, but I would like to have the nightlies and e2e test execution in master asap, because keeping the tests somewhat running / stable is a moving target

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384481057
 
 

 ##########
 File path: flink-end-to-end-tests/test-scripts/common_kubernetes.sh
 ##########
 @@ -51,10 +51,18 @@ function check_kubernetes_status {
 function start_kubernetes_if_not_running {
     if ! check_kubernetes_status; then
         echo "Starting minikube ..."
-        start_command="minikube start"
         # We need sudo permission to set vm-driver to none in linux os.
-        [[ "${OS_TYPE}" = "linux" ]] && start_command="sudo CHANGE_MINIKUBE_NONE_USER=true ${start_command} --vm-driver=none"
-        ${start_command}
+        if [[ "${OS_TYPE}" = "linux" ]] ; then
+            sudo CHANGE_MINIKUBE_NONE_USER=true minikube start --vm-driver=none \
+                --extra-config=kubelet.image-gc-high-threshold=99 \
+                --extra-config=kubelet.image-gc-low-threshold=98 \
+                --extra-config=kubelet.minimum-container-ttl-duration=120m \
+                --extra-config=kubelet.eviction-hard="memory.available<5Mi,nodefs.available<1Mi,imagefs.available<1Mi" \
+                --extra-config=kubelet.eviction-soft="memory.available<5Mi,nodefs.available<2Mi,imagefs.available<2Mi" \
+                --extra-config=kubelet.eviction-soft-grace-period="memory.available=2h,nodefs.available=2h,imagefs.available=2h"
 
 Review comment:
   Thanks for the explanation.
   
   Now document it :P

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#issuecomment-591357656
 
 
   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "baea1576fa2ca73f42fb3460e4ee11c073eddd6e",
       "status" : "PENDING",
       "url" : "https://travis-ci.com/flink-ci/flink/builds/150628235",
       "triggerID" : "baea1576fa2ca73f42fb3460e4ee11c073eddd6e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "baea1576fa2ca73f42fb3460e4ee11c073eddd6e",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5616",
       "triggerID" : "baea1576fa2ca73f42fb3460e4ee11c073eddd6e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * baea1576fa2ca73f42fb3460e4ee11c073eddd6e Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/150628235) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5616) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384452738
 
 

 ##########
 File path: tools/azure-pipelines/jobs-template.yml
 ##########
 @@ -177,6 +145,8 @@ jobs:
     - script: ./tools/azure-pipelines/setup_kubernetes.sh
     - script: M2_HOME=/home/vsts/maven_cache/apache-maven-3.2.5/ PATH=/home/vsts/maven_cache/apache-maven-3.2.5/bin:$PATH ${{parameters.environment}} STAGE=compile ./tools/azure_controller.sh compile
       displayName: Build
+    - script: FLINK_DIR=build-target ./flink-end-to-end-tests/run-pre-commit-tests.sh
 
 Review comment:
   Very good point, thanks!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384483762
 
 

 ##########
 File path: tools/azure-pipelines/jobs-template.yml
 ##########
 @@ -132,20 +134,26 @@ jobs:
         restoreKeys: $(CACHE_FALLBACK_KEY)
         path: $(MAVEN_CACHE_FOLDER)
       displayName: Cache Maven local repo
+      continueOnError: true
     
     # download artifacts
     - task: DownloadPipelineArtifact@2
       inputs:
         path: $(CACHE_FLINK_DIR)
         artifact: FlinkCompileCacheDir-${{parameters.stage_name}}
+    - script: |
+        echo "##vso[task.setvariable variable=JAVA_HOME]$(JAVA_HOME_11_X64)"
 
 Review comment:
   Correct, this is a bit ugly. I basically decide whether we are using jdk11 or not based on the container name (that is not used in this context).
   
   I can address this as follows: I build a generic docker image (that does not distinguish between jdk8 / jdk11 because it contains both), and I switch the jdk version for docker and azure machines the same way.
   
   Are you okay if I address this in a follow up? (I can only test this during the night, because I would overwhelm the available machines)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
zentol commented on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#issuecomment-591447821
 
 
   After an offline chat with @rmetzger we concluded that the caching for the doc gems may not be necessary.
   
   I'm fine with delaying the inclusion of java e2e tests; at this point you want to load test things anyway and they don't _that_ much additional overhead. Additionally we would need a convenient hook that enables all e2e tests.
   
   The jdk11 container stuff is indeed icky, but I'm fine with merging it as is _for the time being_; it's certainly not state we can stay in for a longer period.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384443539
 
 

 ##########
 File path: tools/azure-pipelines/jobs-template.yml
 ##########
 @@ -132,20 +134,26 @@ jobs:
         restoreKeys: $(CACHE_FALLBACK_KEY)
         path: $(MAVEN_CACHE_FOLDER)
       displayName: Cache Maven local repo
+      continueOnError: true
     
     # download artifacts
     - task: DownloadPipelineArtifact@2
       inputs:
         path: $(CACHE_FLINK_DIR)
         artifact: FlinkCompileCacheDir-${{parameters.stage_name}}
+    - script: |
+        echo "##vso[task.setvariable variable=JAVA_HOME]$(JAVA_HOME_11_X64)"
 
 Review comment:
   Is this to have the tests run on java 11 if java 11 is available on the machine? Shouldn't the container have this stuff set up already (or in other words, what's the benefit in having distinct images if extra steps are required to select the jdk version)?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384475047
 
 

 ##########
 File path: flink-end-to-end-tests/test-scripts/common_kubernetes.sh
 ##########
 @@ -51,10 +51,18 @@ function check_kubernetes_status {
 function start_kubernetes_if_not_running {
     if ! check_kubernetes_status; then
         echo "Starting minikube ..."
-        start_command="minikube start"
         # We need sudo permission to set vm-driver to none in linux os.
-        [[ "${OS_TYPE}" = "linux" ]] && start_command="sudo CHANGE_MINIKUBE_NONE_USER=true ${start_command} --vm-driver=none"
-        ${start_command}
+        if [[ "${OS_TYPE}" = "linux" ]] ; then
+            sudo CHANGE_MINIKUBE_NONE_USER=true minikube start --vm-driver=none \
+                --extra-config=kubelet.image-gc-high-threshold=99 \
+                --extra-config=kubelet.image-gc-low-threshold=98 \
+                --extra-config=kubelet.minimum-container-ttl-duration=120m \
+                --extra-config=kubelet.eviction-hard="memory.available<5Mi,nodefs.available<1Mi,imagefs.available<1Mi" \
+                --extra-config=kubelet.eviction-soft="memory.available<5Mi,nodefs.available<2Mi,imagefs.available<2Mi" \
+                --extra-config=kubelet.eviction-soft-grace-period="memory.available=2h,nodefs.available=2h,imagefs.available=2h"
 
 Review comment:
   The VMs provided by azure have 100GB of disk space, out of which 85% are allocated, only are 15GB are free.
   That's enough space for our purposes.
   However, the kubernetes nodes running during the k8s tests believe that 85% are not enough free disk space, so they start garbage collecting their host.
   During GCing, they are deleting all docker images currently not in use.
   However, the k8s test is first building a flink image, then launching stuff on k8s. Sometimes, k8s deletes the newly Flink images, thus it can not find them anymore, letting the test fail / timeout.
   
   That's why I have set the GC threshold to 98% and 99% :) 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384475694
 
 

 ##########
 File path: flink-end-to-end-tests/test-scripts/common_kubernetes.sh
 ##########
 @@ -51,10 +51,18 @@ function check_kubernetes_status {
 function start_kubernetes_if_not_running {
     if ! check_kubernetes_status; then
         echo "Starting minikube ..."
-        start_command="minikube start"
         # We need sudo permission to set vm-driver to none in linux os.
-        [[ "${OS_TYPE}" = "linux" ]] && start_command="sudo CHANGE_MINIKUBE_NONE_USER=true ${start_command} --vm-driver=none"
-        ${start_command}
+        if [[ "${OS_TYPE}" = "linux" ]] ; then
+            sudo CHANGE_MINIKUBE_NONE_USER=true minikube start --vm-driver=none \
+                --extra-config=kubelet.image-gc-high-threshold=99 \
+                --extra-config=kubelet.image-gc-low-threshold=98 \
+                --extra-config=kubelet.minimum-container-ttl-duration=120m \
+                --extra-config=kubelet.eviction-hard="memory.available<5Mi,nodefs.available<1Mi,imagefs.available<1Mi" \
+                --extra-config=kubelet.eviction-soft="memory.available<5Mi,nodefs.available<2Mi,imagefs.available<2Mi" \
+                --extra-config=kubelet.eviction-soft-grace-period="memory.available=2h,nodefs.available=2h,imagefs.available=2h"
 
 Review comment:
   Other people have a similar collection of configuration parameters https://github.com/swaroopar/automation/blob/c1643b1cb66e6591ec33c9e8569ad5721b824052/minikube/start-minikube.sh :) 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384442874
 
 

 ##########
 File path: tools/azure-pipelines/build-apache-repo.yml
 ##########
 @@ -48,3 +58,60 @@ stages:
           e2e_pool_definition:
             vmImage: 'ubuntu-16.04'
           environment: PROFILE="-Dhadoop.version=2.8.3 -Dinclude_hadoop_aws -Dscala-2.11"
+          run_end_to_end: false
+          container: flink-build-jdk8
+  # Special stage for nightly builds:
+  - stage: cron_build
+    displayName: "Cron build"
+    dependsOn: [] # depending on an empty array makes the stages run in parallel
+    condition: or(eq(variables['Build.Reason'], 'Schedule'), eq(variables['MODE'], 'nightly'))
+    jobs:
+      - template: jobs-template.yml
+        parameters:
+          stage_name: cron_build_hadoop241
+          test_pool_definition:
+            name: Default
+          e2e_pool_definition:
+            vmImage: 'ubuntu-16.04'
+          environment: PROFILE="-Dhadoop.version=2.4.1 -Pskip-hive-tests"
+          run_end_to_end: true
+          container: flink-build-jdk8
+      - template: jobs-template.yml
+        parameters:
+          stage_name: cron_build_scala2_12
+          test_pool_definition:
+            name: Default
+          e2e_pool_definition:
+            vmImage: 'ubuntu-16.04'
+          environment: PROFILE="-Dhadoop.version=2.8.3 -Dinclude_hadoop_aws -Dscala-2.12 -Phive-1.2.1"
+          run_end_to_end: true
+          container: flink-build-jdk8
+      - template: jobs-template.yml
+        parameters:
+          stage_name: cron_build_jdk11
+          test_pool_definition:
+            name: Default
+          e2e_pool_definition:
+            vmImage: 'ubuntu-16.04'
+          environment: PROFILE="-Dhadoop.version=2.8.3 -Dinclude_hadoop_aws -Dscala-2.11 -Djdk11"
+          run_end_to_end: true
+          container: flink-build-jdk11
+      - template: jobs-template.yml
+        parameters:
+          stage_name: cron_build_hadoopfree
+          test_pool_definition:
+            name: Default
+          e2e_pool_definition:
+            vmImage: 'ubuntu-16.04'
+          environment: PROFILE=""
+          run_end_to_end: true
+          container: flink-build-jdk8
+      - job: docs_404_check # run on a MSFT provided machine
 
 Review comment:
   where are you caching the gem artifacts?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384470615
 
 

 ##########
 File path: tools/azure-pipelines/jobs-template.yml
 ##########
 @@ -132,20 +134,26 @@ jobs:
         restoreKeys: $(CACHE_FALLBACK_KEY)
         path: $(MAVEN_CACHE_FOLDER)
       displayName: Cache Maven local repo
+      continueOnError: true
     
     # download artifacts
     - task: DownloadPipelineArtifact@2
       inputs:
         path: $(CACHE_FLINK_DIR)
         artifact: FlinkCompileCacheDir-${{parameters.stage_name}}
+    - script: |
+        echo "##vso[task.setvariable variable=JAVA_HOME]$(JAVA_HOME_11_X64)"
 
 Review comment:
   But ... there is condition for this job(?) that ${{parameters.container}} must be `flink-build-jdk11`. Are we _not_ using `rmetzger/flink-ci:ubuntu-jdk11-amd64-2a765ab` when running on azure machines?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] rmetzger closed pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
rmetzger closed pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384466502
 
 

 ##########
 File path: tools/azure-pipelines/jobs-template.yml
 ##########
 @@ -159,9 +167,15 @@ jobs:
         restoreKeys: $(CACHE_FALLBACK_KEY)
         path: $(MAVEN_CACHE_FOLDER)
       displayName: Cache Maven local repo
+      continueOnError: true
+    - script: |
+        echo "##vso[task.setvariable variable=JAVA_HOME]$(JAVA_HOME_11_X64)"
+        echo "##vso[task.setvariable variable=PATH]$(JAVA_HOME_11_X64)/bin;$(PATH)"
+      displayName: "Set to jdk11"
+      condition: eq('${{parameters.container}}', 'flink-build-jdk11')
     - script: ./tools/travis/setup_maven.sh
     - script: ./tools/azure-pipelines/setup_kubernetes.sh
-    - script: M2_HOME=/home/vsts/maven_cache/apache-maven-3.2.5/ PATH=/home/vsts/maven_cache/apache-maven-3.2.5/bin:$PATH PROFILE="-Dinclude-hadoop -Dhadoop.version=2.8.3 -De2e-metrics -Dmaven.wagon.http.pool=false" STAGE=compile ./tools/azure_controller.sh compile
+    - script: M2_HOME=/home/vsts/maven_cache/apache-maven-3.2.5/ PATH=/home/vsts/maven_cache/apache-maven-3.2.5/bin:$PATH ${{parameters.environment}} STAGE=compile ./tools/azure_controller.sh compile
       displayName: Build
     - script: FLINK_DIR=`pwd`/build-target flink-end-to-end-tests/run-nightly-tests.sh
 
 Review comment:
   only the pre-commit e2e tests, which is 1 out of 4 right now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#issuecomment-591357656
 
 
   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "baea1576fa2ca73f42fb3460e4ee11c073eddd6e",
       "status" : "CANCELED",
       "url" : "https://travis-ci.com/flink-ci/flink/builds/150628235",
       "triggerID" : "baea1576fa2ca73f42fb3460e4ee11c073eddd6e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "baea1576fa2ca73f42fb3460e4ee11c073eddd6e",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5616",
       "triggerID" : "baea1576fa2ca73f42fb3460e4ee11c073eddd6e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "721fdc15d019f8ba8205962beab7665c45ee91e7",
       "status" : "PENDING",
       "url" : "https://travis-ci.com/flink-ci/flink/builds/150651151",
       "triggerID" : "721fdc15d019f8ba8205962beab7665c45ee91e7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "721fdc15d019f8ba8205962beab7665c45ee91e7",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5630",
       "triggerID" : "721fdc15d019f8ba8205962beab7665c45ee91e7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * baea1576fa2ca73f42fb3460e4ee11c073eddd6e Travis: [CANCELED](https://travis-ci.com/flink-ci/flink/builds/150628235) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5616) 
   * 721fdc15d019f8ba8205962beab7665c45ee91e7 Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/150651151) Azure: [PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5630) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384406551
 
 

 ##########
 File path: flink-end-to-end-tests/test-scripts/common_yarn_docker.sh
 ##########
 @@ -129,21 +129,31 @@ END
     docker exec master bash -c "cat /home/hadoop-user/$FLINK_DIRNAME/conf/flink-conf.yaml"
 }
 
-function copy_and_show_logs {
-    mkdir -p $TEST_DATA_DIR/logs
+function debug_copy_and_show_logs {
+    echo "Debugging failed YARN Docker test:"
+    echo "Currently running containers"
+    docker ps
+
+    echo "Currently running JVMs"
+    jps -v
+
     echo "Hadoop logs:"
-    docker cp master:/var/log/hadoop/* $TEST_DATA_DIR/logs/
-    for f in $TEST_DATA_DIR/logs/*; do
+    mkdir -p $TEST_DATA_DIR/logs
+    docker cp master:/var/log/hadoop/ $TEST_DATA_DIR/logs/
+    ls -lisah $TEST_DATA_DIR/logs/hadoop
+    for f in $TEST_DATA_DIR/logs/hadoop/*; do
         echo "$f:"
         cat $f
     done
+    
     echo "Docker logs:"
     docker logs master
 
     echo "Flink logs:"
     docker exec master bash -c "kinit -kt /home/hadoop-user/hadoop-user.keytab hadoop-user"
     docker exec master bash -c "yarn application -list -appStates ALL"
     application_id=`docker exec master bash -c "yarn application -list -appStates ALL" | grep "Flink" | grep "cluster" | awk '{print \$1}'`
+    
 
 Review comment:
   I added these blank lines to have more visual structure in this method (which as grown a bit)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384442204
 
 

 ##########
 File path: tools/azure-pipelines/build-apache-repo.yml
 ##########
 @@ -48,3 +58,60 @@ stages:
           e2e_pool_definition:
             vmImage: 'ubuntu-16.04'
           environment: PROFILE="-Dhadoop.version=2.8.3 -Dinclude_hadoop_aws -Dscala-2.11"
+          run_end_to_end: false
+          container: flink-build-jdk8
+  # Special stage for nightly builds:
+  - stage: cron_build
+    displayName: "Cron build"
+    dependsOn: [] # depending on an empty array makes the stages run in parallel
+    condition: or(eq(variables['Build.Reason'], 'Schedule'), eq(variables['MODE'], 'nightly'))
+    jobs:
+      - template: jobs-template.yml
+        parameters:
+          stage_name: cron_build_hadoop241
+          test_pool_definition:
+            name: Default
+          e2e_pool_definition:
+            vmImage: 'ubuntu-16.04'
+          environment: PROFILE="-Dhadoop.version=2.4.1 -Pskip-hive-tests"
+          run_end_to_end: true
+          container: flink-build-jdk8
+      - template: jobs-template.yml
+        parameters:
+          stage_name: cron_build_scala2_12
 
 Review comment:
   would skip to naming convention of the hadoop profile, i.e. `cron_build_scala212`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot commented on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
flinkbot commented on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#issuecomment-591353080
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit baea1576fa2ca73f42fb3460e4ee11c073eddd6e (Wed Feb 26 10:25:49 UTC 2020)
   
   **Warnings:**
    * No documentation files were touched! Remember to keep the Flink docs up to date!
    * **This pull request references an unassigned [Jira ticket](https://issues.apache.org/jira/browse/FLINK-15834).** According to the [code contribution guide](https://flink.apache.org/contributing/contribute-code.html), tickets need to be assigned before starting with the implementation work.
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] flinkbot edited a comment on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#issuecomment-591357656
 
 
   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "baea1576fa2ca73f42fb3460e4ee11c073eddd6e",
       "status" : "PENDING",
       "url" : "https://travis-ci.com/flink-ci/flink/builds/150628235",
       "triggerID" : "baea1576fa2ca73f42fb3460e4ee11c073eddd6e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "baea1576fa2ca73f42fb3460e4ee11c073eddd6e",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5616",
       "triggerID" : "baea1576fa2ca73f42fb3460e4ee11c073eddd6e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "721fdc15d019f8ba8205962beab7665c45ee91e7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "721fdc15d019f8ba8205962beab7665c45ee91e7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * baea1576fa2ca73f42fb3460e4ee11c073eddd6e Travis: [PENDING](https://travis-ci.com/flink-ci/flink/builds/150628235) Azure: [FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=5616) 
   * 721fdc15d019f8ba8205962beab7665c45ee91e7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384456815
 
 

 ##########
 File path: tools/azure-pipelines/jobs-template.yml
 ##########
 @@ -159,9 +167,15 @@ jobs:
         restoreKeys: $(CACHE_FALLBACK_KEY)
         path: $(MAVEN_CACHE_FOLDER)
       displayName: Cache Maven local repo
+      continueOnError: true
+    - script: |
+        echo "##vso[task.setvariable variable=JAVA_HOME]$(JAVA_HOME_11_X64)"
+        echo "##vso[task.setvariable variable=PATH]$(JAVA_HOME_11_X64)/bin;$(PATH)"
+      displayName: "Set to jdk11"
+      condition: eq('${{parameters.container}}', 'flink-build-jdk11')
     - script: ./tools/travis/setup_maven.sh
     - script: ./tools/azure-pipelines/setup_kubernetes.sh
-    - script: M2_HOME=/home/vsts/maven_cache/apache-maven-3.2.5/ PATH=/home/vsts/maven_cache/apache-maven-3.2.5/bin:$PATH PROFILE="-Dinclude-hadoop -Dhadoop.version=2.8.3 -De2e-metrics -Dmaven.wagon.http.pool=false" STAGE=compile ./tools/azure_controller.sh compile
+    - script: M2_HOME=/home/vsts/maven_cache/apache-maven-3.2.5/ PATH=/home/vsts/maven_cache/apache-maven-3.2.5/bin:$PATH ${{parameters.environment}} STAGE=compile ./tools/azure_controller.sh compile
       displayName: Build
     - script: FLINK_DIR=`pwd`/build-target flink-end-to-end-tests/run-nightly-tests.sh
 
 Review comment:
   In my understanding these are executed in the `misc` profile, from the `travis_watchdog.sh` script. (See an example of a `misc` profile: https://dev.azure.com/rmetzger/Flink/_build/results?buildId=5590&view=logs&j=b09cc737-3452-5710-2b65-ee4e507c2164&t=8141c33c-fb34-5bae-c47a-dcb67e7c446b) 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384441453
 
 

 ##########
 File path: tools/azure-pipelines/jobs-template.yml
 ##########
 @@ -177,6 +145,8 @@ jobs:
     - script: ./tools/azure-pipelines/setup_kubernetes.sh
     - script: M2_HOME=/home/vsts/maven_cache/apache-maven-3.2.5/ PATH=/home/vsts/maven_cache/apache-maven-3.2.5/bin:$PATH ${{parameters.environment}} STAGE=compile ./tools/azure_controller.sh compile
       displayName: Build
+    - script: FLINK_DIR=build-target ./flink-end-to-end-tests/run-pre-commit-tests.sh
 
 Review comment:
   add a TODO to remove this since we're pretty much eliminating the concept of pre-commit e2e tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384453442
 
 

 ##########
 File path: tools/azure-pipelines/build-apache-repo.yml
 ##########
 @@ -48,3 +58,60 @@ stages:
           e2e_pool_definition:
             vmImage: 'ubuntu-16.04'
           environment: PROFILE="-Dhadoop.version=2.8.3 -Dinclude_hadoop_aws -Dscala-2.11"
+          run_end_to_end: false
+          container: flink-build-jdk8
+  # Special stage for nightly builds:
+  - stage: cron_build
+    displayName: "Cron build"
+    dependsOn: [] # depending on an empty array makes the stages run in parallel
+    condition: or(eq(variables['Build.Reason'], 'Schedule'), eq(variables['MODE'], 'nightly'))
+    jobs:
+      - template: jobs-template.yml
+        parameters:
+          stage_name: cron_build_hadoop241
+          test_pool_definition:
+            name: Default
+          e2e_pool_definition:
+            vmImage: 'ubuntu-16.04'
+          environment: PROFILE="-Dhadoop.version=2.4.1 -Pskip-hive-tests"
+          run_end_to_end: true
+          container: flink-build-jdk8
+      - template: jobs-template.yml
+        parameters:
+          stage_name: cron_build_scala2_12
+          test_pool_definition:
+            name: Default
+          e2e_pool_definition:
+            vmImage: 'ubuntu-16.04'
+          environment: PROFILE="-Dhadoop.version=2.8.3 -Dinclude_hadoop_aws -Dscala-2.12 -Phive-1.2.1"
+          run_end_to_end: true
+          container: flink-build-jdk8
+      - template: jobs-template.yml
+        parameters:
+          stage_name: cron_build_jdk11
+          test_pool_definition:
+            name: Default
+          e2e_pool_definition:
+            vmImage: 'ubuntu-16.04'
+          environment: PROFILE="-Dhadoop.version=2.8.3 -Dinclude_hadoop_aws -Dscala-2.11 -Djdk11"
+          run_end_to_end: true
+          container: flink-build-jdk11
+      - template: jobs-template.yml
+        parameters:
+          stage_name: cron_build_hadoopfree
+          test_pool_definition:
+            name: Default
+          e2e_pool_definition:
+            vmImage: 'ubuntu-16.04'
+          environment: PROFILE=""
+          run_end_to_end: true
+          container: flink-build-jdk8
+      - job: docs_404_check # run on a MSFT provided machine
 
 Review comment:
   I'm not caching them at all. But this job passes in 5 minutes, so I'm not concerned.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#discussion_r384452260
 
 

 ##########
 File path: flink-end-to-end-tests/test-scripts/common_kubernetes.sh
 ##########
 @@ -51,10 +51,18 @@ function check_kubernetes_status {
 function start_kubernetes_if_not_running {
     if ! check_kubernetes_status; then
         echo "Starting minikube ..."
-        start_command="minikube start"
         # We need sudo permission to set vm-driver to none in linux os.
-        [[ "${OS_TYPE}" = "linux" ]] && start_command="sudo CHANGE_MINIKUBE_NONE_USER=true ${start_command} --vm-driver=none"
-        ${start_command}
+        if [[ "${OS_TYPE}" = "linux" ]] ; then
+            sudo CHANGE_MINIKUBE_NONE_USER=true minikube start --vm-driver=none \
+                --extra-config=kubelet.image-gc-high-threshold=99 \
+                --extra-config=kubelet.image-gc-low-threshold=98 \
+                --extra-config=kubelet.minimum-container-ttl-duration=120m \
+                --extra-config=kubelet.eviction-hard="memory.available<5Mi,nodefs.available<1Mi,imagefs.available<1Mi" \
+                --extra-config=kubelet.eviction-soft="memory.available<5Mi,nodefs.available<2Mi,imagefs.available<2Mi" \
+                --extra-config=kubelet.eviction-soft-grace-period="memory.available=2h,nodefs.available=2h,imagefs.available=2h"
 
 Review comment:
   Not at all. This relates to the second problem I described above `Problem: Low disk space let the kublet delete unused docker images, including images generated for the test.`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] zentol edited a comment on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements

Posted by GitBox <gi...@apache.org>.
zentol edited a comment on issue #11222: [FLINK-15834] Set up nightly builds in Azure & various CI improvements
URL: https://github.com/apache/flink/pull/11222#issuecomment-591447821
 
 
   After an offline chat with @rmetzger we concluded that the caching for the doc gems may not be necessary.
   
   I'm fine with delaying the inclusion of java e2e tests; at this point you want to load test things anyway and they don't _that_ much additional overhead. Additionally we would need a convenient hook that enables all e2e tests.
   
   The jdk11 container stuff is indeed icky, but I'm fine with merging it as is _for the time being_; we get the answers we want now, but it's certainly not state we can stay in for a longer period.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services