You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/05/20 14:07:25 UTC

[GitHub] [flink] rmetzger opened a new pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

rmetzger opened a new pull request #12268:
URL: https://github.com/apache/flink/pull/12268


   ## What is the purpose of the change
   
   Clean up the CI-related scripts in `tools/`.
   
   
   ## Brief change log
   
   For reviewing this change, I recommend starting from the `job-template.yml` file to see how the scripts are connected.
   
   - travis_watchdog.sh used to be a combination of things: test stage control (also python test invocation), debug artifact management (mostly uploading them), test timeout control. The biggest issue was how the python tests were integrated into that file. I moved the "watchdog" functionality into a separate file, and created a new `test_controller.sh`.
   - azure_controller.sh used to be the entry point for the CI system, controlling the compile stage. I moved most of the stuff into the `tools/ci/compile.sh`
   
   
   ## Verifying this change
   
   I have tested timing out builds (both for regular maven/surefire/java timeouts and python) to make sure the refactored watchdog works, and exit codes are properly forwarded.
   
   Once the PR has reached an acceptable state, I will also test the nightly builds on my personal Azure account to make sure the python wheels definition works.
   
   The separation of changes into separate commits is not optimal (some YARN changes are a bit unrelated in the refactoring commit)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on a change in pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #12268:
URL: https://github.com/apache/flink/pull/12268#discussion_r429910310



##########
File path: tools/azure-pipelines/jobs-template.yml
##########
@@ -163,6 +163,7 @@ jobs:
       IT_CASE_S3_SECRET_KEY: $(SECRET_S3_SECRET_KEY)
 
   - task: PublishTestResults@2
+    condition: succeededOrFailed()

Review comment:
       The `PublishTestResults` will produce a warning that no tests were found.
   But we actually have this warning in every run (because the python profile has no Junit tests: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2125&view=results




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger closed pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
rmetzger closed pull request #12268:
URL: https://github.com/apache/flink/pull/12268


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12268:
URL: https://github.com/apache/flink/pull/12268#issuecomment-631512695


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4ed6888375869e654816264124703e72439c6148",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1955",
       "triggerID" : "4ed6888375869e654816264124703e72439c6148",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76f1c9f6921d3453e476dba3be53860b53a489a6",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2126",
       "triggerID" : "76f1c9f6921d3453e476dba3be53860b53a489a6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4ed6888375869e654816264124703e72439c6148 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1955) 
   * 76f1c9f6921d3453e476dba3be53860b53a489a6 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2126) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12268:
URL: https://github.com/apache/flink/pull/12268#issuecomment-631512695


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4ed6888375869e654816264124703e72439c6148",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1955",
       "triggerID" : "4ed6888375869e654816264124703e72439c6148",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76f1c9f6921d3453e476dba3be53860b53a489a6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2126",
       "triggerID" : "76f1c9f6921d3453e476dba3be53860b53a489a6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 76f1c9f6921d3453e476dba3be53860b53a489a6 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2126) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on a change in pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #12268:
URL: https://github.com/apache/flink/pull/12268#discussion_r429914110



##########
File path: flink-yarn-tests/src/test/java/org/apache/flink/yarn/YarnTestBase.java
##########
@@ -1068,8 +1067,8 @@ public static void teardown() throws Exception {
 
 	}
 
-	public static boolean isOnTravis() {
-		return System.getenv("TRAVIS") != null && System.getenv("TRAVIS").equals("true");
+	public static boolean isOnCI() {
+		return System.getenv("IS_CI") != null && System.getenv("IS_CI").equals("true");

Review comment:
       Nope, we are not. What needs to be done for that?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on a change in pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #12268:
URL: https://github.com/apache/flink/pull/12268#discussion_r429907884



##########
File path: tools/ci/test_controller.sh
##########
@@ -0,0 +1,127 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains generic control over the test execution.
+#
+
+HERE="`dirname \"$0\"`"             # relative
+HERE="`( cd \"$HERE\" && pwd )`"    # absolutized and normalized
+if [ -z "$HERE" ] ; then
+	exit 1
+fi
+
+source "${HERE}/stage.sh"
+source "${HERE}/maven-utils.sh"
+source "${HERE}/controller_utils.sh"
+source "${HERE}/watchdog.sh"
+TEST=$1
+
+# =============================================================================
+# Step 0: Check & print environment information & configure env
+# =============================================================================
+
+# check preconditions
+if [ -z "$DEBUG_FILES" ] ; then
+	echo "ERROR: Environment variable 'DEBUG_FILES' is not set but expected by test_controller.sh"

Review comment:
       I struggled with the name :) 
   What about `TEST_OUTPUT_DIR` ? 
   or `LOG_AND_TRACE_DIR`

##########
File path: tools/ci/test_controller.sh
##########
@@ -0,0 +1,127 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains generic control over the test execution.
+#
+
+HERE="`dirname \"$0\"`"             # relative
+HERE="`( cd \"$HERE\" && pwd )`"    # absolutized and normalized
+if [ -z "$HERE" ] ; then
+	exit 1
+fi
+
+source "${HERE}/stage.sh"
+source "${HERE}/maven-utils.sh"
+source "${HERE}/controller_utils.sh"
+source "${HERE}/watchdog.sh"
+TEST=$1
+
+# =============================================================================
+# Step 0: Check & print environment information & configure env
+# =============================================================================
+
+# check preconditions
+if [ -z "$DEBUG_FILES" ] ; then
+	echo "ERROR: Environment variable 'DEBUG_FILES' is not set but expected by test_controller.sh"

Review comment:
       I will extend the message.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #12268:
URL: https://github.com/apache/flink/pull/12268#issuecomment-631512695


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4ed6888375869e654816264124703e72439c6148",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4ed6888375869e654816264124703e72439c6148",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4ed6888375869e654816264124703e72439c6148 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
rmetzger commented on pull request #12268:
URL: https://github.com/apache/flink/pull/12268#issuecomment-633452614


   Thanks a lot for the review!
   
   I added two more commits & I restored the deploy to maven file.
   
   I now consider this PR ready to merge % the git history % your feedback.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on a change in pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #12268:
URL: https://github.com/apache/flink/pull/12268#discussion_r429918097



##########
File path: tools/ci/test_controller.sh
##########
@@ -0,0 +1,127 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains generic control over the test execution.
+#
+
+HERE="`dirname \"$0\"`"             # relative
+HERE="`( cd \"$HERE\" && pwd )`"    # absolutized and normalized
+if [ -z "$HERE" ] ; then
+	exit 1
+fi
+
+source "${HERE}/stage.sh"
+source "${HERE}/maven-utils.sh"
+source "${HERE}/controller_utils.sh"
+source "${HERE}/watchdog.sh"
+TEST=$1
+
+# =============================================================================
+# Step 0: Check & print environment information & configure env
+# =============================================================================
+
+# check preconditions
+if [ -z "$DEBUG_FILES" ] ; then
+	echo "ERROR: Environment variable 'DEBUG_FILES' is not set but expected by test_controller.sh"
+	exit 1
+fi
+
+if [ ! -d "$DEBUG_FILES" ] ; then
+	echo "ERROR: Environment variable DEBUG_FILES=$DEBUG_FILES points to a directory that does not exist"
+	exit 1
+fi
+
+if [ -z "$TEST" ] ; then
+	echo "ERROR: Environment variable 'TEST' is not set but expected by test_controller.sh"
+	exit 1
+fi
+
+echo "Printing environment information"
+
+echo "PATH=$PATH"
+run_mvn -version
+echo "Commit: $(git rev-parse HEAD)"
+print_system_info
+
+# enable coredumps for this process
+ulimit -c unlimited
+
+# configure JVMs to produce heap dumps
+export JAVA_TOOL_OPTIONS="-XX:+HeapDumpOnOutOfMemoryError"
+
+# some tests provide additional logs if they find this variable
+export IS_CI=true
+
+# =============================================================================
+# Step 1: Compile Flink (again)

Review comment:
       Thanks, I will update the comment




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12268:
URL: https://github.com/apache/flink/pull/12268#issuecomment-631512695


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4ed6888375869e654816264124703e72439c6148",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1955",
       "triggerID" : "4ed6888375869e654816264124703e72439c6148",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4ed6888375869e654816264124703e72439c6148 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1955) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on a change in pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #12268:
URL: https://github.com/apache/flink/pull/12268#discussion_r429974586



##########
File path: tools/ci/watchdog.sh
##########
@@ -0,0 +1,122 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains a watchdog tool to run monitor and potentially kill tasks
+# not producing any output for n seconds.
+#
+
+# Number of seconds w/o output before printing a stack trace and killing the watched process
+MAX_NO_OUTPUT=${MAX_NO_OUTPUT:-900}
+
+# Number of seconds to sleep before checking the output again
+SLEEP_TIME=${SLEEP_TIME:-20}
+
+CMD_OUT=${CMD_OUT:-"/tmp/watchdog.out"}
+CMD_PID=${CMD_PID:-"/tmp/watchdog.pid"}
+CMD_EXIT=${CMD_EXIT:-"/tmp/watchdog.exit"}
+
+
+# =============================================
+# Utility functions
+# ============================================= 
+
+mod_time () {
+	echo `stat -c "%Y" $CMD_OUT`
+}
+
+the_time() {
+	echo `date +%s`
+}
+
+# watchdog process
+
+watchdog () {
+	touch $CMD_OUT
+
+	while true; do
+		sleep $SLEEP_TIME
+
+		time_diff=$((`the_time` - `mod_time`))
+
+		if [ $time_diff -ge $MAX_NO_OUTPUT ]; then
+			echo "=============================================================================="
+			echo "Process produced no output for ${MAX_NO_OUTPUT} seconds."
+			echo "=============================================================================="
+
+			# run timeout callback
+			$WATCHDOG_CALLBACK_ON_TIMEOUT
+
+			echo "Killing process with pid=$(<$CMD_PID) and all descendants"
+			pkill -P $(<$CMD_PID) # kill descendants
+			kill $(<$CMD_PID) # kill process itself
+
+			exit 1
+		fi
+	done
+}
+
+assume_available () {
+	VAR=$1
+	if [ -z "$VAR" ] ; then
+		echo "ERROR: Environment variable '$VAR' is not set but expected by watchdog.sh"
+		exit 1
+	fi
+}
+
+# =============================================
+# main function
+# =============================================
+
+# entrypoint
+function run_with_watchdog() {
+	local cmd="$1"
+
+	# check preconditions
+	assume_available CMD_OUT # used for writing the process output (to check for activity)
+	assume_available CMD_PID # location of file to write process id to
+	assume_available CMD_EXIT # location of file to writ exit code to
+	assume_available WATCHDOG_CALLBACK_ON_TIMEOUT # bash function to call on timeout
+
+	watchdog &

Review comment:
       `CMD_PID` is **not** a variable containing the command process id. It is the filename used to store and retrieve the command process id from. In that sense, this should be more considered as an internal configuration, not an argument to be passed around.
   
   There is probably no logical argument why the two approaches we are discussing are better or worse, because both approaches work.
   
   But in my opinion, the readability is a bit better when globally defining the names of helper files, instead of globally defining them for the main method but passing it for helper methods as an argument. Something that is passed as an argument to a method is usually considered part of the actual work being done.

##########
File path: tools/azure-pipelines/jobs-template.yml
##########
@@ -121,15 +128,34 @@ jobs:
     continueOnError: true # continue the build even if the cache fails.
     condition: not(eq('${{parameters.test_pool_definition.name}}', 'Default'))
     displayName: Cache Maven local repo
+
   - script: |
       echo "##vso[task.setvariable variable=JAVA_HOME]$JAVA_HOME_11_X64"
       echo "##vso[task.setvariable variable=PATH]$JAVA_HOME_11_X64/bin:$PATH"
     displayName: "Set to jdk11"
     condition: eq('${{parameters.jdk}}', 'jdk11')  
+
   - script: sudo sysctl -w kernel.core_pattern=core.%p
     displayName: Set coredump pattern
+
   # Test
-  - script: STAGE=test ${{parameters.environment}} ./tools/azure-pipelines/azure_controller.sh $(module)
+  - script: |
+      ./tools/azure-pipelines/unpack_build_artifact.sh
+      export DEBUG_FILES="$AGENT_TEMPDIRECTORY/debug_files"

Review comment:
       Thanks for your feedback. I will drastically shorten the `- script: |` sections.
   

##########
File path: tools/azure-pipelines/build-python-wheels.yml
##########
@@ -24,7 +24,6 @@ jobs:
       # Compile
       - script: |
           ${{parameters.environment}} ./tools/ci/compile.sh
-          ./tools/azure-pipelines/create_build_artifact.sh

Review comment:
       Ah, I see. I will look into this :) 

##########
File path: flink-yarn-tests/src/test/java/org/apache/flink/yarn/YarnTestBase.java
##########
@@ -1068,8 +1067,8 @@ public static void teardown() throws Exception {
 
 	}
 
-	public static boolean isOnTravis() {
-		return System.getenv("TRAVIS") != null && System.getenv("TRAVIS").equals("true");
+	public static boolean isOnCI() {
+		return System.getenv("IS_CI") != null && System.getenv("IS_CI").equals("true");

Review comment:
       Cool, thx. Will do.

##########
File path: tools/ci/maven-utils.sh
##########
@@ -73,7 +73,7 @@ function collect_coredumps {
 	echo "Searching for .dump, .dumpstream and related files in '$SEARCHDIR'"
 	for file in `find $SEARCHDIR -type f -regextype posix-extended -iregex '.*\.hprof|.*\.dump|.*\.dumpstream|.*hs.*\.log|.*/core(.[0-9]+)?$'`; do
 		echo "Moving '$file' to target directory ('$TARGET_DIR')"
-		mv $file $TARGET_DIR/
+		mv $file $TARGET_DIR/$(echo $file | tr "/" "-")

Review comment:
       I just checked, and these dumpstream files just contain for example the following
   ```
   # Created at 2020-05-25T10:24:38.843
   Picked up JAVA_TOOL_OPTIONS: -XX:+HeapDumpOnOutOfMemoryError
   ```
   
   I guess the surefire plugin is creating a dumpstream file if it receives any output, but it only fails if it receives unexpected output?
   

##########
File path: tools/azure-pipelines/build-python-wheels.yml
##########
@@ -24,7 +24,6 @@ jobs:
       # Compile
       - script: |
           ${{parameters.environment}} ./tools/ci/compile.sh
-          ./tools/azure-pipelines/create_build_artifact.sh

Review comment:
       Initial tests suggest that we can really remove all exclusions :) 

##########
File path: tools/ci/watchdog.sh
##########
@@ -0,0 +1,122 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains a watchdog tool to run monitor and potentially kill tasks
+# not producing any output for n seconds.
+#
+
+# Number of seconds w/o output before printing a stack trace and killing the watched process
+MAX_NO_OUTPUT=${MAX_NO_OUTPUT:-900}
+
+# Number of seconds to sleep before checking the output again
+SLEEP_TIME=${SLEEP_TIME:-20}
+
+CMD_OUT=${CMD_OUT:-"/tmp/watchdog.out"}
+CMD_PID=${CMD_PID:-"/tmp/watchdog.pid"}
+CMD_EXIT=${CMD_EXIT:-"/tmp/watchdog.exit"}
+
+
+# =============================================
+# Utility functions
+# ============================================= 
+
+mod_time () {
+	echo `stat -c "%Y" $CMD_OUT`
+}
+
+the_time() {
+	echo `date +%s`
+}
+
+# watchdog process
+
+watchdog () {
+	touch $CMD_OUT
+
+	while true; do
+		sleep $SLEEP_TIME
+
+		time_diff=$((`the_time` - `mod_time`))
+
+		if [ $time_diff -ge $MAX_NO_OUTPUT ]; then
+			echo "=============================================================================="
+			echo "Process produced no output for ${MAX_NO_OUTPUT} seconds."
+			echo "=============================================================================="
+
+			# run timeout callback
+			$WATCHDOG_CALLBACK_ON_TIMEOUT
+
+			echo "Killing process with pid=$(<$CMD_PID) and all descendants"
+			pkill -P $(<$CMD_PID) # kill descendants
+			kill $(<$CMD_PID) # kill process itself
+
+			exit 1
+		fi
+	done
+}
+
+assume_available () {
+	VAR=$1
+	if [ -z "$VAR" ] ; then
+		echo "ERROR: Environment variable '$VAR' is not set but expected by watchdog.sh"
+		exit 1
+	fi
+}
+
+# =============================================
+# main function
+# =============================================
+
+# entrypoint
+function run_with_watchdog() {
+	local cmd="$1"
+
+	# check preconditions
+	assume_available CMD_OUT # used for writing the process output (to check for activity)
+	assume_available CMD_PID # location of file to write process id to
+	assume_available CMD_EXIT # location of file to writ exit code to
+	assume_available WATCHDOG_CALLBACK_ON_TIMEOUT # bash function to call on timeout
+
+	watchdog &

Review comment:
       Thank you for your clarifications! I now understand your concerns. 
   I agree that the use of `assume_available` for something internal is weird.
   
   I will push an updated version of the script.

##########
File path: tools/azure-pipelines/jobs-template.yml
##########
@@ -64,14 +65,16 @@ jobs:
     displayName: "Set to jdk11"
     condition: eq('${{parameters.jdk}}', 'jdk11')
   # Compile
-  - script: STAGE=compile ${{parameters.environment}} ./tools/azure_controller.sh compile
-    displayName: Build
+  - script: |
+      ${{parameters.environment}} ./tools/ci/compile.sh || exit $?
+      ./tools/azure-pipelines/create_build_artifact.sh
+    displayName: Compile
 
   # upload artifacts for next stage
   - task: PublishPipelineArtifact@1
     inputs:
-      path: $(CACHE_FLINK_DIR)

Review comment:
       The documentation does not mention the parameter anymore. Since it still works, I assume it's deprecated.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #12268:
URL: https://github.com/apache/flink/pull/12268#discussion_r429961569



##########
File path: tools/ci/watchdog.sh
##########
@@ -0,0 +1,122 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains a watchdog tool to run monitor and potentially kill tasks
+# not producing any output for n seconds.
+#
+
+# Number of seconds w/o output before printing a stack trace and killing the watched process
+MAX_NO_OUTPUT=${MAX_NO_OUTPUT:-900}
+
+# Number of seconds to sleep before checking the output again
+SLEEP_TIME=${SLEEP_TIME:-20}
+
+CMD_OUT=${CMD_OUT:-"/tmp/watchdog.out"}
+CMD_PID=${CMD_PID:-"/tmp/watchdog.pid"}
+CMD_EXIT=${CMD_EXIT:-"/tmp/watchdog.exit"}
+
+
+# =============================================
+# Utility functions
+# ============================================= 
+
+mod_time () {
+	echo `stat -c "%Y" $CMD_OUT`
+}
+
+the_time() {
+	echo `date +%s`
+}
+
+# watchdog process
+
+watchdog () {
+	touch $CMD_OUT
+
+	while true; do
+		sleep $SLEEP_TIME
+
+		time_diff=$((`the_time` - `mod_time`))
+
+		if [ $time_diff -ge $MAX_NO_OUTPUT ]; then
+			echo "=============================================================================="
+			echo "Process produced no output for ${MAX_NO_OUTPUT} seconds."
+			echo "=============================================================================="
+
+			# run timeout callback
+			$WATCHDOG_CALLBACK_ON_TIMEOUT
+
+			echo "Killing process with pid=$(<$CMD_PID) and all descendants"
+			pkill -P $(<$CMD_PID) # kill descendants
+			kill $(<$CMD_PID) # kill process itself
+
+			exit 1
+		fi
+	done
+}
+
+assume_available () {
+	VAR=$1
+	if [ -z "$VAR" ] ; then
+		echo "ERROR: Environment variable '$VAR' is not set but expected by watchdog.sh"
+		exit 1
+	fi
+}
+
+# =============================================
+# main function
+# =============================================
+
+# entrypoint
+function run_with_watchdog() {
+	local cmd="$1"
+
+	# check preconditions
+	assume_available CMD_OUT # used for writing the process output (to check for activity)
+	assume_available CMD_PID # location of file to write process id to
+	assume_available CMD_EXIT # location of file to writ exit code to
+	assume_available WATCHDOG_CALLBACK_ON_TIMEOUT # bash function to call on timeout
+
+	watchdog &

Review comment:
       It access CMD_PID etc. and the callback.

##########
File path: tools/azure-pipelines/jobs-template.yml
##########
@@ -121,15 +128,34 @@ jobs:
     continueOnError: true # continue the build even if the cache fails.
     condition: not(eq('${{parameters.test_pool_definition.name}}', 'Default'))
     displayName: Cache Maven local repo
+
   - script: |
       echo "##vso[task.setvariable variable=JAVA_HOME]$JAVA_HOME_11_X64"
       echo "##vso[task.setvariable variable=PATH]$JAVA_HOME_11_X64/bin:$PATH"
     displayName: "Set to jdk11"
     condition: eq('${{parameters.jdk}}', 'jdk11')  
+
   - script: sudo sysctl -w kernel.core_pattern=core.%p
     displayName: Set coredump pattern
+
   # Test
-  - script: STAGE=test ${{parameters.environment}} ./tools/azure-pipelines/azure_controller.sh $(module)
+  - script: |
+      ./tools/azure-pipelines/unpack_build_artifact.sh
+      export DEBUG_FILES="$AGENT_TEMPDIRECTORY/debug_files"

Review comment:
       I figured as much.
   
   I'd say that if it is not just calling script A,B,C, then add a separate script for it. What we absolutely want to prevent is the scripts being spread out over several places.
   TBH there's probably even merit in having each step call at most 1 script.

##########
File path: tools/azure-pipelines/build-python-wheels.yml
##########
@@ -24,7 +24,6 @@ jobs:
       # Compile
       - script: |
           ${{parameters.environment}} ./tools/ci/compile.sh
-          ./tools/azure-pipelines/create_build_artifact.sh

Review comment:
       The excludes are adding files _back_, increasing the file size.

##########
File path: tools/azure-pipelines/build-python-wheels.yml
##########
@@ -24,7 +24,6 @@ jobs:
       # Compile
       - script: |
           ${{parameters.environment}} ./tools/ci/compile.sh
-          ./tools/azure-pipelines/create_build_artifact.sh

Review comment:
       The excludes are adding files (jars) _back_, increasing the file size.

##########
File path: flink-yarn-tests/src/test/java/org/apache/flink/yarn/YarnTestBase.java
##########
@@ -1068,8 +1067,8 @@ public static void teardown() throws Exception {
 
 	}
 
-	public static boolean isOnTravis() {
-		return System.getenv("TRAVIS") != null && System.getenv("TRAVIS").equals("true");
+	public static boolean isOnCI() {
+		return System.getenv("IS_CI") != null && System.getenv("IS_CI").equals("true");

Review comment:
       Adjust the TravisDownloadCache(Factory) to detect azure environment variables and determine the correct cache directory.

##########
File path: tools/ci/maven-utils.sh
##########
@@ -73,7 +73,7 @@ function collect_coredumps {
 	echo "Searching for .dump, .dumpstream and related files in '$SEARCHDIR'"
 	for file in `find $SEARCHDIR -type f -regextype posix-extended -iregex '.*\.hprof|.*\.dump|.*\.dumpstream|.*hs.*\.log|.*/core(.[0-9]+)?$'`; do
 		echo "Moving '$file' to target directory ('$TARGET_DIR')"
-		mv $file $TARGET_DIR/
+		mv $file $TARGET_DIR/$(echo $file | tr "/" "-")

Review comment:
       how could there be dumpstreams in multiple modules? Aren't we failing on the first test failure?

##########
File path: tools/ci/watchdog.sh
##########
@@ -0,0 +1,122 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains a watchdog tool to run monitor and potentially kill tasks
+# not producing any output for n seconds.
+#
+
+# Number of seconds w/o output before printing a stack trace and killing the watched process
+MAX_NO_OUTPUT=${MAX_NO_OUTPUT:-900}
+
+# Number of seconds to sleep before checking the output again
+SLEEP_TIME=${SLEEP_TIME:-20}
+
+CMD_OUT=${CMD_OUT:-"/tmp/watchdog.out"}
+CMD_PID=${CMD_PID:-"/tmp/watchdog.pid"}
+CMD_EXIT=${CMD_EXIT:-"/tmp/watchdog.exit"}
+
+
+# =============================================
+# Utility functions
+# ============================================= 
+
+mod_time () {
+	echo `stat -c "%Y" $CMD_OUT`
+}
+
+the_time() {
+	echo `date +%s`
+}
+
+# watchdog process
+
+watchdog () {
+	touch $CMD_OUT
+
+	while true; do
+		sleep $SLEEP_TIME
+
+		time_diff=$((`the_time` - `mod_time`))
+
+		if [ $time_diff -ge $MAX_NO_OUTPUT ]; then
+			echo "=============================================================================="
+			echo "Process produced no output for ${MAX_NO_OUTPUT} seconds."
+			echo "=============================================================================="
+
+			# run timeout callback
+			$WATCHDOG_CALLBACK_ON_TIMEOUT
+
+			echo "Killing process with pid=$(<$CMD_PID) and all descendants"
+			pkill -P $(<$CMD_PID) # kill descendants
+			kill $(<$CMD_PID) # kill process itself
+
+			exit 1
+		fi
+	done
+}
+
+assume_available () {
+	VAR=$1
+	if [ -z "$VAR" ] ; then
+		echo "ERROR: Environment variable '$VAR' is not set but expected by watchdog.sh"
+		exit 1
+	fi
+}
+
+# =============================================
+# main function
+# =============================================
+
+# entrypoint
+function run_with_watchdog() {
+	local cmd="$1"
+
+	# check preconditions
+	assume_available CMD_OUT # used for writing the process output (to check for activity)
+	assume_available CMD_PID # location of file to write process id to
+	assume_available CMD_EXIT # location of file to writ exit code to
+	assume_available WATCHDOG_CALLBACK_ON_TIMEOUT # bash function to call on timeout
+
+	watchdog &

Review comment:
       I'm well aware what `CMD_PID` contains.
   
   I don't quite buy into it being an internal configuration when at the same time we allow it to be modified from the outside. If it is internal configuration there's_ no point in doing the assume_available checks.
   
   > There is probably no logical argument why the two approaches we are discussing are better or worse, because both approaches work.
   
   Meh. Clarity matters, especially when working on bash scripts. Using environment variables to pass around named arguments is common, and that's how I interpret `CMD_PID`. The same applies to `SLEEP_TIME ` and `MAX_NO_OUTPUT`.
   If it is internal configuration then it shouldn't be modifiable; if it is external configuration then it should be explicitly passed around so you have a single place for checking preconditions, like assume_available.

##########
File path: tools/ci/watchdog.sh
##########
@@ -0,0 +1,122 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains a watchdog tool to run monitor and potentially kill tasks
+# not producing any output for n seconds.
+#
+
+# Number of seconds w/o output before printing a stack trace and killing the watched process
+MAX_NO_OUTPUT=${MAX_NO_OUTPUT:-900}
+
+# Number of seconds to sleep before checking the output again
+SLEEP_TIME=${SLEEP_TIME:-20}
+
+CMD_OUT=${CMD_OUT:-"/tmp/watchdog.out"}
+CMD_PID=${CMD_PID:-"/tmp/watchdog.pid"}
+CMD_EXIT=${CMD_EXIT:-"/tmp/watchdog.exit"}
+
+
+# =============================================
+# Utility functions
+# ============================================= 
+
+mod_time () {
+	echo `stat -c "%Y" $CMD_OUT`
+}
+
+the_time() {
+	echo `date +%s`
+}
+
+# watchdog process
+
+watchdog () {
+	touch $CMD_OUT
+
+	while true; do
+		sleep $SLEEP_TIME
+
+		time_diff=$((`the_time` - `mod_time`))
+
+		if [ $time_diff -ge $MAX_NO_OUTPUT ]; then
+			echo "=============================================================================="
+			echo "Process produced no output for ${MAX_NO_OUTPUT} seconds."
+			echo "=============================================================================="
+
+			# run timeout callback
+			$WATCHDOG_CALLBACK_ON_TIMEOUT
+
+			echo "Killing process with pid=$(<$CMD_PID) and all descendants"
+			pkill -P $(<$CMD_PID) # kill descendants
+			kill $(<$CMD_PID) # kill process itself
+
+			exit 1
+		fi
+	done
+}
+
+assume_available () {
+	VAR=$1
+	if [ -z "$VAR" ] ; then
+		echo "ERROR: Environment variable '$VAR' is not set but expected by watchdog.sh"
+		exit 1
+	fi
+}
+
+# =============================================
+# main function
+# =============================================
+
+# entrypoint
+function run_with_watchdog() {
+	local cmd="$1"
+
+	# check preconditions
+	assume_available CMD_OUT # used for writing the process output (to check for activity)
+	assume_available CMD_PID # location of file to write process id to
+	assume_available CMD_EXIT # location of file to writ exit code to
+	assume_available WATCHDOG_CALLBACK_ON_TIMEOUT # bash function to call on timeout
+
+	watchdog &

Review comment:
       Note that I also referred to the callback, which is very much not internal configuration.

##########
File path: tools/ci/watchdog.sh
##########
@@ -0,0 +1,122 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains a watchdog tool to run monitor and potentially kill tasks
+# not producing any output for n seconds.
+#
+
+# Number of seconds w/o output before printing a stack trace and killing the watched process
+MAX_NO_OUTPUT=${MAX_NO_OUTPUT:-900}
+
+# Number of seconds to sleep before checking the output again
+SLEEP_TIME=${SLEEP_TIME:-20}
+
+CMD_OUT=${CMD_OUT:-"/tmp/watchdog.out"}
+CMD_PID=${CMD_PID:-"/tmp/watchdog.pid"}
+CMD_EXIT=${CMD_EXIT:-"/tmp/watchdog.exit"}
+
+
+# =============================================
+# Utility functions
+# ============================================= 
+
+mod_time () {
+	echo `stat -c "%Y" $CMD_OUT`
+}
+
+the_time() {
+	echo `date +%s`
+}
+
+# watchdog process
+
+watchdog () {
+	touch $CMD_OUT
+
+	while true; do
+		sleep $SLEEP_TIME
+
+		time_diff=$((`the_time` - `mod_time`))
+
+		if [ $time_diff -ge $MAX_NO_OUTPUT ]; then
+			echo "=============================================================================="
+			echo "Process produced no output for ${MAX_NO_OUTPUT} seconds."
+			echo "=============================================================================="
+
+			# run timeout callback
+			$WATCHDOG_CALLBACK_ON_TIMEOUT
+
+			echo "Killing process with pid=$(<$CMD_PID) and all descendants"
+			pkill -P $(<$CMD_PID) # kill descendants
+			kill $(<$CMD_PID) # kill process itself
+
+			exit 1
+		fi
+	done
+}
+
+assume_available () {
+	VAR=$1
+	if [ -z "$VAR" ] ; then
+		echo "ERROR: Environment variable '$VAR' is not set but expected by watchdog.sh"
+		exit 1
+	fi
+}
+
+# =============================================
+# main function
+# =============================================
+
+# entrypoint
+function run_with_watchdog() {
+	local cmd="$1"
+
+	# check preconditions
+	assume_available CMD_OUT # used for writing the process output (to check for activity)
+	assume_available CMD_PID # location of file to write process id to
+	assume_available CMD_EXIT # location of file to writ exit code to
+	assume_available WATCHDOG_CALLBACK_ON_TIMEOUT # bash function to call on timeout
+
+	watchdog &

Review comment:
       I'm well aware what `CMD_PID` contains.
   
   I don't quite buy into it being an internal configuration when at the same time we allow it to be modified from the outside. If it is internal configuration there's_ no point in doing the assume_available checks.
   
   > There is probably no logical argument why the two approaches we are discussing are better or worse, because both approaches work.
   
   Meh. Clarity matters, especially when working on bash scripts. Using environment variables to pass around named arguments is common, and that's how I interpret `CMD_PID`. The same applies to `SLEEP_TIME` and `MAX_NO_OUTPUT`.
   If it is internal configuration then it shouldn't be modifiable; if it is external configuration then it should be explicitly passed around so you have a single place for checking preconditions, like assume_available.

##########
File path: tools/ci/test_controller.sh
##########
@@ -0,0 +1,127 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains generic control over the test execution.
+#
+
+HERE="`dirname \"$0\"`"             # relative
+HERE="`( cd \"$HERE\" && pwd )`"    # absolutized and normalized
+if [ -z "$HERE" ] ; then
+	exit 1
+fi
+
+source "${HERE}/stage.sh"
+source "${HERE}/maven-utils.sh"
+source "${HERE}/controller_utils.sh"
+source "${HERE}/watchdog.sh"
+TEST=$1
+
+# =============================================================================
+# Step 0: Check & print environment information & configure env
+# =============================================================================
+
+# check preconditions
+if [ -z "$DEBUG_FILES" ] ; then
+	echo "ERROR: Environment variable 'DEBUG_FILES' is not set but expected by test_controller.sh"

Review comment:
       I like the DEBUG_FILES prefix; maybe `DEBUG_FILES_OUTPUT_DIR`?

##########
File path: tools/ci/maven-utils.sh
##########
@@ -73,7 +73,7 @@ function collect_coredumps {
 	echo "Searching for .dump, .dumpstream and related files in '$SEARCHDIR'"
 	for file in `find $SEARCHDIR -type f -regextype posix-extended -iregex '.*\.hprof|.*\.dump|.*\.dumpstream|.*hs.*\.log|.*/core(.[0-9]+)?$'`; do
 		echo "Moving '$file' to target directory ('$TARGET_DIR')"
-		mv $file $TARGET_DIR/
+		mv $file $TARGET_DIR/$(echo $file | tr "/" "-")

Review comment:
       hmm, not sure. Maybe this happens because the JVM writes that to the STDERR. That's annoying :/
   
   This is probably fine then; I was initially worried we might get so long filenames that you wouldn't be able to extract the logs on Windows.

##########
File path: tools/ci/stage.sh
##########
@@ -158,6 +158,9 @@ function get_compile_modules_for_stage() {
             # the negation takes precedence, thus not all required modules would be built
             echo ""
         ;;
+        (${STAGE_PYTHON})
+            echo ""

Review comment:
       sounds link echo "-pl $flink-dist -am" would be sufficient tho

##########
File path: tools/ci/stage.sh
##########
@@ -158,6 +158,9 @@ function get_compile_modules_for_stage() {
             # the negation takes precedence, thus not all required modules would be built
             echo ""
         ;;
+        (${STAGE_PYTHON})
+            echo ""

Review comment:
       sounds link `echo "-pl $flink-dist -am"` would be sufficient tho

##########
File path: tools/azure-pipelines/debug_files_utils.sh
##########
@@ -0,0 +1,34 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+function debug_files_prepare {

Review comment:
       nit: odd naming scheme with the verb being the suffix

##########
File path: tools/azure-pipelines/jobs-template.yml
##########
@@ -64,14 +65,16 @@ jobs:
     displayName: "Set to jdk11"
     condition: eq('${{parameters.jdk}}', 'jdk11')
   # Compile
-  - script: STAGE=compile ${{parameters.environment}} ./tools/azure_controller.sh compile
-    displayName: Build
+  - script: |
+      ${{parameters.environment}} ./tools/ci/compile.sh || exit $?
+      ./tools/azure-pipelines/create_build_artifact.sh
+    displayName: Compile
 
   # upload artifacts for next stage
   - task: PublishPipelineArtifact@1
     inputs:
-      path: $(CACHE_FLINK_DIR)

Review comment:
       so is `path` like a deprecated parameter?

##########
File path: flink-end-to-end-tests/run-nightly-tests.sh
##########
@@ -235,15 +235,15 @@ printf "Running Java end-to-end tests\n"
 printf "==============================================================================\n"
 
 
-LOG4J_PROPERTIES=${END_TO_END_DIR}/../tools/log4j-travis.properties
+LOG4J_PROPERTIES=${END_TO_END_DIR}/../tools/ci/log4j-ci.properties

Review comment:
       nit: the "ci" suffix could probably be removed, given that the directory already contains "ci".

##########
File path: tools/ci/controller_utils.sh
##########
@@ -0,0 +1,60 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+print_system_info() {
+    echo "CPU information"
+    lscpu
+
+    echo "Memory information"
+    cat /proc/meminfo
+
+    echo "Disk information"
+    df -hH
+
+    echo "Running build as"
+    whoami
+}
+
+# locate YARN logs and put them into artifacts directory
+put_yarn_logs_to_artifacts() {
+	for file in `find ./flink-yarn-tests/target -type f -name '*.log'`; do
+		TARGET_FILE=`echo "$file" | grep -Eo "container_[0-9_]+/(.*).log"`
+		TARGET_DIR=`dirname	 "$TARGET_FILE"`
+		mkdir -p "$DEBUG_FILES_OUTPUT_DIR/yarn-tests/$TARGET_DIR"
+		cp $file "$DEBUG_FILES_OUTPUT_DIR/yarn-tests/$TARGET_FILE"
+	done
+}
+
+print_stacktraces () {
+	echo "=============================================================================="
+	echo "The following Java processes are running (JPS)"
+	echo "=============================================================================="
+
+	jps

Review comment:
       I guess this is for debug purposes? It may be better to store the results in a variable, and then echo/awk it. Otherwise it could happen that jps returns 2 different set of processes, which could be misleading.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #12268:
URL: https://github.com/apache/flink/pull/12268#issuecomment-631497626


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit 4ed6888375869e654816264124703e72439c6148 (Wed May 20 14:09:51 UTC 2020)
   
   **Warnings:**
    * Documentation files were touched, but no `.zh.md` files: Update Chinese documentation or file Jira ticket.
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on a change in pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #12268:
URL: https://github.com/apache/flink/pull/12268#discussion_r429905990



##########
File path: tools/ci/watchdog.sh
##########
@@ -0,0 +1,122 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains a watchdog tool to run monitor and potentially kill tasks
+# not producing any output for n seconds.
+#
+
+# Number of seconds w/o output before printing a stack trace and killing the watched process
+MAX_NO_OUTPUT=${MAX_NO_OUTPUT:-900}
+
+# Number of seconds to sleep before checking the output again
+SLEEP_TIME=${SLEEP_TIME:-20}
+
+CMD_OUT=${CMD_OUT:-"/tmp/watchdog.out"}
+CMD_PID=${CMD_PID:-"/tmp/watchdog.pid"}
+CMD_EXIT=${CMD_EXIT:-"/tmp/watchdog.exit"}

Review comment:
       No, but I see no harm in making them configurable.
   When I was changing this, I still assumed there was a reason why we used different files for the python tests.

##########
File path: tools/ci/watchdog.sh
##########
@@ -0,0 +1,122 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains a watchdog tool to run monitor and potentially kill tasks
+# not producing any output for n seconds.
+#
+
+# Number of seconds w/o output before printing a stack trace and killing the watched process
+MAX_NO_OUTPUT=${MAX_NO_OUTPUT:-900}
+
+# Number of seconds to sleep before checking the output again
+SLEEP_TIME=${SLEEP_TIME:-20}
+
+CMD_OUT=${CMD_OUT:-"/tmp/watchdog.out"}
+CMD_PID=${CMD_PID:-"/tmp/watchdog.pid"}
+CMD_EXIT=${CMD_EXIT:-"/tmp/watchdog.exit"}

Review comment:
       Let me know if you want me to change this




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12268:
URL: https://github.com/apache/flink/pull/12268#issuecomment-631512695






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12268:
URL: https://github.com/apache/flink/pull/12268#issuecomment-631512695


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4ed6888375869e654816264124703e72439c6148",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1955",
       "triggerID" : "4ed6888375869e654816264124703e72439c6148",
       "triggerType" : "PUSH"
     }, {
       "hash" : "76f1c9f6921d3453e476dba3be53860b53a489a6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "76f1c9f6921d3453e476dba3be53860b53a489a6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4ed6888375869e654816264124703e72439c6148 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1955) 
   * 76f1c9f6921d3453e476dba3be53860b53a489a6 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on a change in pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #12268:
URL: https://github.com/apache/flink/pull/12268#discussion_r429918956



##########
File path: tools/azure-pipelines/build-python-wheels.yml
##########
@@ -24,7 +24,6 @@ jobs:
       # Compile
       - script: |
           ${{parameters.environment}} ./tools/ci/compile.sh
-          ./tools/azure-pipelines/create_build_artifact.sh

Review comment:
       `build-python-wheels.yml` never relied on this script. I don't fully understand this comment.
   The excludes in `create_build_artifact.sh` are for reducing the file size because those files are not needed?!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on a change in pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #12268:
URL: https://github.com/apache/flink/pull/12268#discussion_r429909137



##########
File path: tools/azure-pipelines/jobs-template.yml
##########
@@ -121,15 +128,34 @@ jobs:
     continueOnError: true # continue the build even if the cache fails.
     condition: not(eq('${{parameters.test_pool_definition.name}}', 'Default'))
     displayName: Cache Maven local repo
+
   - script: |
       echo "##vso[task.setvariable variable=JAVA_HOME]$JAVA_HOME_11_X64"
       echo "##vso[task.setvariable variable=PATH]$JAVA_HOME_11_X64/bin:$PATH"
     displayName: "Set to jdk11"
     condition: eq('${{parameters.jdk}}', 'jdk11')  
+
   - script: sudo sysctl -w kernel.core_pattern=core.%p
     displayName: Set coredump pattern
+
   # Test
-  - script: STAGE=test ${{parameters.environment}} ./tools/azure-pipelines/azure_controller.sh $(module)
+  - script: |
+      ./tools/azure-pipelines/unpack_build_artifact.sh
+      export DEBUG_FILES="$AGENT_TEMPDIRECTORY/debug_files"

Review comment:
       I considered it, but I had the feeling that these steps are still short enough to keep them in the template.yml.
   If you think it would improve readability,  can move them into script.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #12268:
URL: https://github.com/apache/flink/pull/12268#discussion_r429879700



##########
File path: tools/ci/watchdog.sh
##########
@@ -0,0 +1,122 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains a watchdog tool to run monitor and potentially kill tasks
+# not producing any output for n seconds.
+#
+
+# Number of seconds w/o output before printing a stack trace and killing the watched process
+MAX_NO_OUTPUT=${MAX_NO_OUTPUT:-900}
+
+# Number of seconds to sleep before checking the output again
+SLEEP_TIME=${SLEEP_TIME:-20}
+
+CMD_OUT=${CMD_OUT:-"/tmp/watchdog.out"}
+CMD_PID=${CMD_PID:-"/tmp/watchdog.pid"}
+CMD_EXIT=${CMD_EXIT:-"/tmp/watchdog.exit"}

Review comment:
       are these actually set anywhere?

##########
File path: tools/ci/test_controller.sh
##########
@@ -0,0 +1,127 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains generic control over the test execution.
+#
+
+HERE="`dirname \"$0\"`"             # relative
+HERE="`( cd \"$HERE\" && pwd )`"    # absolutized and normalized
+if [ -z "$HERE" ] ; then
+	exit 1
+fi
+
+source "${HERE}/stage.sh"
+source "${HERE}/maven-utils.sh"
+source "${HERE}/controller_utils.sh"
+source "${HERE}/watchdog.sh"
+TEST=$1
+
+# =============================================================================
+# Step 0: Check & print environment information & configure env
+# =============================================================================
+
+# check preconditions
+if [ -z "$DEBUG_FILES" ] ; then
+	echo "ERROR: Environment variable 'DEBUG_FILES' is not set but expected by test_controller.sh"
+	exit 1
+fi
+
+if [ ! -d "$DEBUG_FILES" ] ; then
+	echo "ERROR: Environment variable DEBUG_FILES=$DEBUG_FILES points to a directory that does not exist"
+	exit 1
+fi
+
+if [ -z "$TEST" ] ; then
+	echo "ERROR: Environment variable 'TEST' is not set but expected by test_controller.sh"
+	exit 1
+fi
+
+echo "Printing environment information"
+
+echo "PATH=$PATH"
+run_mvn -version
+echo "Commit: $(git rev-parse HEAD)"
+print_system_info
+
+# enable coredumps for this process
+ulimit -c unlimited
+
+# configure JVMs to produce heap dumps
+export JAVA_TOOL_OPTIONS="-XX:+HeapDumpOnOutOfMemoryError"
+
+# some tests provide additional logs if they find this variable
+export IS_CI=true
+
+# =============================================================================
+# Step 1: Compile Flink (again)
+# =============================================================================
+
+WATCHDOG_CALLBACK_ON_TIMEOUT="print_stacktraces | tee ${DEBUG_FILES}/jps-traces.out"
+
+LOG4J_PROPERTIES=${HERE}/log4j-ci.properties
+MVN_LOGGING_OPTIONS="-Dlog.dir=${DEBUG_FILES} -Dlog4j.configurationFile=file://$LOG4J_PROPERTIES"
+# Maven command to run. We set the forkCount manually, because otherwise Maven sees too many cores
+# on some CI environments. Set forkCountTestPackage to 1 for container-based environment (4 GiB memory)
+# and 2 for sudo-enabled environment (7.5 GiB memory).
+MVN_COMMON_OPTIONS="-Dflink.forkCount=2 -Dflink.forkCountTestPackage=2 -Dfast -Pskip-webui-build $MVN_LOGGING_OPTIONS"
+MVN_COMPILE_OPTIONS="-DskipTests"
+MVN_COMPILE_MODULES=$(get_compile_modules_for_stage ${TEST})
+
+run_with_watchdog "run_mvn $MVN_COMMON_OPTIONS $MVN_COMPILE_OPTIONS $PROFILE $MVN_COMPILE_MODULES install"
+EXIT_CODE=$?
+
+if [ $EXIT_CODE != 0 ]; then
+	echo "=============================================================================="
+	echo "Compilation failure detected, skipping test execution."
+	echo "=============================================================================="
+	exit $EXIT_CODE
+fi
+
+
+# =============================================================================
+# Step 2: Run tests
+# =============================================================================
+
+if [ $TEST == $STAGE_PYTHON ]; then
+	run_with_watchdog "./flink-python/dev/lint-python.sh"
+	EXIT_CODE=$?
+else
+	MVN_TEST_OPTIONS="-Dflink.tests.with-openssl"
+	MVN_TEST_MODULES=$(get_test_modules_for_stage ${TEST})
+
+	run_with_watchdog "run_mvn $MVN_COMMON_OPTIONS $MVN_TEST_OPTIONS $PROFILE $MVN_TEST_MODULES verify"
+	EXIT_CODE=$?
+fi
+
+# =============================================================================
+# Step 3: Put extra logs into $DEBUG_FILES
+# =============================================================================
+
+# only misc builds flink-yarn-tests
+case $TEST in
+	(misc)
+		put_yarn_logs_to_artifacts
+	;;
+esac
+
+collect_coredumps `pwd` $DEBUG_FILES

Review comment:
       nit: Use `$(pwd)` for consistency/clarity?

##########
File path: tools/ci/test_controller.sh
##########
@@ -0,0 +1,127 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains generic control over the test execution.
+#
+
+HERE="`dirname \"$0\"`"             # relative
+HERE="`( cd \"$HERE\" && pwd )`"    # absolutized and normalized
+if [ -z "$HERE" ] ; then
+	exit 1
+fi
+
+source "${HERE}/stage.sh"
+source "${HERE}/maven-utils.sh"
+source "${HERE}/controller_utils.sh"
+source "${HERE}/watchdog.sh"
+TEST=$1
+
+# =============================================================================
+# Step 0: Check & print environment information & configure env
+# =============================================================================
+
+# check preconditions
+if [ -z "$DEBUG_FILES" ] ; then
+	echo "ERROR: Environment variable 'DEBUG_FILES' is not set but expected by test_controller.sh"
+	exit 1
+fi
+
+if [ ! -d "$DEBUG_FILES" ] ; then
+	echo "ERROR: Environment variable DEBUG_FILES=$DEBUG_FILES points to a directory that does not exist"
+	exit 1
+fi
+
+if [ -z "$TEST" ] ; then
+	echo "ERROR: Environment variable 'TEST' is not set but expected by test_controller.sh"
+	exit 1
+fi
+
+echo "Printing environment information"
+
+echo "PATH=$PATH"
+run_mvn -version
+echo "Commit: $(git rev-parse HEAD)"
+print_system_info
+
+# enable coredumps for this process
+ulimit -c unlimited
+
+# configure JVMs to produce heap dumps
+export JAVA_TOOL_OPTIONS="-XX:+HeapDumpOnOutOfMemoryError"
+
+# some tests provide additional logs if they find this variable
+export IS_CI=true
+
+# =============================================================================
+# Step 1: Compile Flink (again)
+# =============================================================================
+
+WATCHDOG_CALLBACK_ON_TIMEOUT="print_stacktraces | tee ${DEBUG_FILES}/jps-traces.out"

Review comment:
       move this right above `run_with_watchdog`

##########
File path: tools/azure-pipelines/build-python-wheels.yml
##########
@@ -33,14 +32,13 @@ jobs:
         name: set_version
 
       - script: |
-          cd $(Pipeline.Workspace)
-          tar czf $(Pipeline.Workspace)/flink.tar.gz flink_cache/flink-dist/target/flink-$(set_version.VERSION)-bin/flink-$(set_version.VERSION)
+          tar czf $(Pipeline.Workspace)/flink.tar.gz flink-dist/target/flink-$(set_version.VERSION)-bin/flink-$(set_version.VERSION)
         displayName: Compress in tgz
 
       # upload artifacts for building wheels
       - task: PublishPipelineArtifact@1
         inputs:
-          path: $(FLINK_ARTIFACT_DIR)
+          targetPath: $(Pipeline.Workspace)/flink.tar.gz

Review comment:
       why is this set to `targetPath`? jobs-template.yml uses `path`

##########
File path: tools/ci/watchdog.sh
##########
@@ -0,0 +1,122 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains a watchdog tool to run monitor and potentially kill tasks
+# not producing any output for n seconds.
+#
+
+# Number of seconds w/o output before printing a stack trace and killing the watched process
+MAX_NO_OUTPUT=${MAX_NO_OUTPUT:-900}
+
+# Number of seconds to sleep before checking the output again
+SLEEP_TIME=${SLEEP_TIME:-20}
+
+CMD_OUT=${CMD_OUT:-"/tmp/watchdog.out"}
+CMD_PID=${CMD_PID:-"/tmp/watchdog.pid"}
+CMD_EXIT=${CMD_EXIT:-"/tmp/watchdog.exit"}
+
+
+# =============================================
+# Utility functions
+# ============================================= 
+
+mod_time () {
+	echo `stat -c "%Y" $CMD_OUT`
+}
+
+the_time() {
+	echo `date +%s`
+}
+
+# watchdog process
+
+watchdog () {
+	touch $CMD_OUT
+
+	while true; do
+		sleep $SLEEP_TIME
+
+		time_diff=$((`the_time` - `mod_time`))
+
+		if [ $time_diff -ge $MAX_NO_OUTPUT ]; then
+			echo "=============================================================================="
+			echo "Process produced no output for ${MAX_NO_OUTPUT} seconds."
+			echo "=============================================================================="
+
+			# run timeout callback
+			$WATCHDOG_CALLBACK_ON_TIMEOUT
+
+			echo "Killing process with pid=$(<$CMD_PID) and all descendants"
+			pkill -P $(<$CMD_PID) # kill descendants
+			kill $(<$CMD_PID) # kill process itself
+
+			exit 1
+		fi
+	done
+}
+
+assume_available () {
+	VAR=$1
+	if [ -z "$VAR" ] ; then
+		echo "ERROR: Environment variable '$VAR' is not set but expected by watchdog.sh"
+		exit 1
+	fi
+}
+
+# =============================================
+# main function
+# =============================================
+
+# entrypoint
+function run_with_watchdog() {
+	local cmd="$1"
+
+	# check preconditions
+	assume_available CMD_OUT # used for writing the process output (to check for activity)
+	assume_available CMD_PID # location of file to write process id to
+	assume_available CMD_EXIT # location of file to writ exit code to
+	assume_available WATCHDOG_CALLBACK_ON_TIMEOUT # bash function to call on timeout
+
+	watchdog &

Review comment:
       I'd prefer explicitly passing the required arguments.

##########
File path: tools/ci/watchdog.sh
##########
@@ -0,0 +1,122 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains a watchdog tool to run monitor and potentially kill tasks

Review comment:
       ```suggestion
   # This file contains a watchdog tool to monitor a task and potentially kill it after
   ```

##########
File path: tools/ci/test_controller.sh
##########
@@ -0,0 +1,127 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains generic control over the test execution.
+#
+
+HERE="`dirname \"$0\"`"             # relative
+HERE="`( cd \"$HERE\" && pwd )`"    # absolutized and normalized
+if [ -z "$HERE" ] ; then
+	exit 1
+fi
+
+source "${HERE}/stage.sh"
+source "${HERE}/maven-utils.sh"
+source "${HERE}/controller_utils.sh"
+source "${HERE}/watchdog.sh"
+TEST=$1
+
+# =============================================================================
+# Step 0: Check & print environment information & configure env
+# =============================================================================
+
+# check preconditions
+if [ -z "$DEBUG_FILES" ] ; then
+	echo "ERROR: Environment variable 'DEBUG_FILES' is not set but expected by test_controller.sh"
+	exit 1
+fi
+
+if [ ! -d "$DEBUG_FILES" ] ; then
+	echo "ERROR: Environment variable DEBUG_FILES=$DEBUG_FILES points to a directory that does not exist"
+	exit 1
+fi
+
+if [ -z "$TEST" ] ; then
+	echo "ERROR: Environment variable 'TEST' is not set but expected by test_controller.sh"

Review comment:
       same as above

##########
File path: tools/ci/stage.sh
##########
@@ -158,6 +158,9 @@ function get_compile_modules_for_stage() {
             # the negation takes precedence, thus not all required modules would be built
             echo ""
         ;;
+        (${STAGE_PYTHON})
+            echo ""

Review comment:
       add a comment for why we compile everything

##########
File path: tools/ci/test_controller.sh
##########
@@ -0,0 +1,127 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains generic control over the test execution.
+#
+
+HERE="`dirname \"$0\"`"             # relative
+HERE="`( cd \"$HERE\" && pwd )`"    # absolutized and normalized
+if [ -z "$HERE" ] ; then
+	exit 1
+fi
+
+source "${HERE}/stage.sh"
+source "${HERE}/maven-utils.sh"
+source "${HERE}/controller_utils.sh"
+source "${HERE}/watchdog.sh"
+TEST=$1
+
+# =============================================================================
+# Step 0: Check & print environment information & configure env
+# =============================================================================
+
+# check preconditions
+if [ -z "$DEBUG_FILES" ] ; then
+	echo "ERROR: Environment variable 'DEBUG_FILES' is not set but expected by test_controller.sh"
+	exit 1
+fi
+
+if [ ! -d "$DEBUG_FILES" ] ; then
+	echo "ERROR: Environment variable DEBUG_FILES=$DEBUG_FILES points to a directory that does not exist"
+	exit 1
+fi
+
+if [ -z "$TEST" ] ; then
+	echo "ERROR: Environment variable 'TEST' is not set but expected by test_controller.sh"
+	exit 1
+fi
+
+echo "Printing environment information"
+
+echo "PATH=$PATH"
+run_mvn -version
+echo "Commit: $(git rev-parse HEAD)"
+print_system_info
+
+# enable coredumps for this process
+ulimit -c unlimited
+
+# configure JVMs to produce heap dumps
+export JAVA_TOOL_OPTIONS="-XX:+HeapDumpOnOutOfMemoryError"
+
+# some tests provide additional logs if they find this variable
+export IS_CI=true
+
+# =============================================================================
+# Step 1: Compile Flink (again)

Review comment:
       we usually aren't compiling Flink again, but are just rebuilding the jars and installing it into the maven repository.

##########
File path: tools/ci/test_controller.sh
##########
@@ -0,0 +1,127 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains generic control over the test execution.
+#
+
+HERE="`dirname \"$0\"`"             # relative
+HERE="`( cd \"$HERE\" && pwd )`"    # absolutized and normalized
+if [ -z "$HERE" ] ; then
+	exit 1
+fi
+
+source "${HERE}/stage.sh"
+source "${HERE}/maven-utils.sh"
+source "${HERE}/controller_utils.sh"
+source "${HERE}/watchdog.sh"
+TEST=$1
+
+# =============================================================================
+# Step 0: Check & print environment information & configure env
+# =============================================================================
+
+# check preconditions
+if [ -z "$DEBUG_FILES" ] ; then
+	echo "ERROR: Environment variable 'DEBUG_FILES' is not set but expected by test_controller.sh"

Review comment:
       add more info on what this is used for; and potentially adjust the name because it is quite non-descriptive

##########
File path: tools/ci/test_controller.sh
##########
@@ -0,0 +1,127 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains generic control over the test execution.
+#
+
+HERE="`dirname \"$0\"`"             # relative
+HERE="`( cd \"$HERE\" && pwd )`"    # absolutized and normalized
+if [ -z "$HERE" ] ; then
+	exit 1
+fi
+
+source "${HERE}/stage.sh"
+source "${HERE}/maven-utils.sh"
+source "${HERE}/controller_utils.sh"
+source "${HERE}/watchdog.sh"
+TEST=$1
+
+# =============================================================================
+# Step 0: Check & print environment information & configure env
+# =============================================================================
+
+# check preconditions
+if [ -z "$DEBUG_FILES" ] ; then
+	echo "ERROR: Environment variable 'DEBUG_FILES' is not set but expected by test_controller.sh"
+	exit 1
+fi
+
+if [ ! -d "$DEBUG_FILES" ] ; then
+	echo "ERROR: Environment variable DEBUG_FILES=$DEBUG_FILES points to a directory that does not exist"
+	exit 1
+fi
+
+if [ -z "$TEST" ] ; then
+	echo "ERROR: Environment variable 'TEST' is not set but expected by test_controller.sh"
+	exit 1
+fi
+
+echo "Printing environment information"
+
+echo "PATH=$PATH"
+run_mvn -version
+echo "Commit: $(git rev-parse HEAD)"
+print_system_info
+
+# enable coredumps for this process
+ulimit -c unlimited
+
+# configure JVMs to produce heap dumps
+export JAVA_TOOL_OPTIONS="-XX:+HeapDumpOnOutOfMemoryError"
+
+# some tests provide additional logs if they find this variable
+export IS_CI=true
+
+# =============================================================================
+# Step 1: Compile Flink (again)
+# =============================================================================
+
+WATCHDOG_CALLBACK_ON_TIMEOUT="print_stacktraces | tee ${DEBUG_FILES}/jps-traces.out"
+
+LOG4J_PROPERTIES=${HERE}/log4j-ci.properties
+MVN_LOGGING_OPTIONS="-Dlog.dir=${DEBUG_FILES} -Dlog4j.configurationFile=file://$LOG4J_PROPERTIES"
+# Maven command to run. We set the forkCount manually, because otherwise Maven sees too many cores
+# on some CI environments. Set forkCountTestPackage to 1 for container-based environment (4 GiB memory)

Review comment:
       the information about available memory is likely out-dated.

##########
File path: tools/azure-pipelines/jobs-template.yml
##########
@@ -163,6 +163,7 @@ jobs:
       IT_CASE_S3_SECRET_KEY: $(SECRET_S3_SECRET_KEY)
 
   - task: PublishTestResults@2
+    condition: succeededOrFailed()

Review comment:
       what happens if say, the compilation fails? Is this case handle gracefully?

##########
File path: tools/azure-pipelines/jobs-template.yml
##########
@@ -64,14 +65,20 @@ jobs:
     displayName: "Set to jdk11"
     condition: eq('${{parameters.jdk}}', 'jdk11')
   # Compile
-  - script: STAGE=compile ${{parameters.environment}} ./tools/azure-pipelines/azure_controller.sh compile
-    displayName: Build
+  - script: |
+      ${{parameters.environment}} ./tools/ci/compile.sh
+      EXIT_CODE=$?

Review comment:
       same as below

##########
File path: tools/ci/maven-utils.sh
##########
@@ -73,7 +73,7 @@ function collect_coredumps {
 	echo "Searching for .dump, .dumpstream and related files in '$SEARCHDIR'"
 	for file in `find $SEARCHDIR -type f -regextype posix-extended -iregex '.*\.hprof|.*\.dump|.*\.dumpstream|.*hs.*\.log|.*/core(.[0-9]+)?$'`; do
 		echo "Moving '$file' to target directory ('$TARGET_DIR')"
-		mv $file $TARGET_DIR/
+		mv $file $TARGET_DIR/$(echo $file | tr "/" "-")

Review comment:
       shouldn't be possible to have forward slashes in file names on unix

##########
File path: flink-yarn-tests/src/test/java/org/apache/flink/yarn/YarnTestBase.java
##########
@@ -1068,8 +1067,8 @@ public static void teardown() throws Exception {
 
 	}
 
-	public static boolean isOnTravis() {
-		return System.getenv("TRAVIS") != null && System.getenv("TRAVIS").equals("true");
+	public static boolean isOnCI() {
+		return System.getenv("IS_CI") != null && System.getenv("IS_CI").equals("true");

Review comment:
       hmm, this reminds me, we aren't caching downloads for the e2e tests on azure, are we?

##########
File path: tools/ci/watchdog.sh
##########
@@ -0,0 +1,122 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains a watchdog tool to run monitor and potentially kill tasks
+# not producing any output for n seconds.
+#
+
+# Number of seconds w/o output before printing a stack trace and killing the watched process
+MAX_NO_OUTPUT=${MAX_NO_OUTPUT:-900}
+
+# Number of seconds to sleep before checking the output again
+SLEEP_TIME=${SLEEP_TIME:-20}
+
+CMD_OUT=${CMD_OUT:-"/tmp/watchdog.out"}
+CMD_PID=${CMD_PID:-"/tmp/watchdog.pid"}
+CMD_EXIT=${CMD_EXIT:-"/tmp/watchdog.exit"}
+
+
+# =============================================
+# Utility functions
+# ============================================= 
+
+mod_time () {
+	echo `stat -c "%Y" $CMD_OUT`
+}
+
+the_time() {
+	echo `date +%s`
+}
+
+# watchdog process
+
+watchdog () {
+	touch $CMD_OUT
+
+	while true; do
+		sleep $SLEEP_TIME
+
+		time_diff=$((`the_time` - `mod_time`))
+
+		if [ $time_diff -ge $MAX_NO_OUTPUT ]; then
+			echo "=============================================================================="
+			echo "Process produced no output for ${MAX_NO_OUTPUT} seconds."
+			echo "=============================================================================="
+
+			# run timeout callback
+			$WATCHDOG_CALLBACK_ON_TIMEOUT
+
+			echo "Killing process with pid=$(<$CMD_PID) and all descendants"
+			pkill -P $(<$CMD_PID) # kill descendants
+			kill $(<$CMD_PID) # kill process itself
+
+			exit 1
+		fi
+	done
+}
+
+assume_available () {
+	VAR=$1
+	if [ -z "$VAR" ] ; then
+		echo "ERROR: Environment variable '$VAR' is not set but expected by watchdog.sh"
+		exit 1
+	fi
+}
+
+# =============================================
+# main function
+# =============================================
+
+# entrypoint
+function run_with_watchdog() {
+	local cmd="$1"
+
+	# check preconditions
+	assume_available CMD_OUT # used for writing the process output (to check for activity)
+	assume_available CMD_PID # location of file to write process id to
+	assume_available CMD_EXIT # location of file to writ exit code to
+	assume_available WATCHDOG_CALLBACK_ON_TIMEOUT # bash function to call on timeout

Review comment:
       if you passed the comment into assume_available you could provide a better error message

##########
File path: tools/azure-pipelines/jobs-template.yml
##########
@@ -121,15 +128,34 @@ jobs:
     continueOnError: true # continue the build even if the cache fails.
     condition: not(eq('${{parameters.test_pool_definition.name}}', 'Default'))
     displayName: Cache Maven local repo
+
   - script: |
       echo "##vso[task.setvariable variable=JAVA_HOME]$JAVA_HOME_11_X64"
       echo "##vso[task.setvariable variable=PATH]$JAVA_HOME_11_X64/bin:$PATH"
     displayName: "Set to jdk11"
     condition: eq('${{parameters.jdk}}', 'jdk11')  
+
   - script: sudo sysctl -w kernel.core_pattern=core.%p
     displayName: Set coredump pattern
+
   # Test
-  - script: STAGE=test ${{parameters.environment}} ./tools/azure-pipelines/azure_controller.sh $(module)
+  - script: |
+      ./tools/azure-pipelines/unpack_build_artifact.sh
+      export DEBUG_FILES="$AGENT_TEMPDIRECTORY/debug_files"

Review comment:
       why isn't all this put into a script?

##########
File path: tools/azure-pipelines/build-python-wheels.yml
##########
@@ -24,7 +24,6 @@ jobs:
       # Compile
       - script: |
           ${{parameters.environment}} ./tools/ci/compile.sh
-          ./tools/azure-pipelines/create_build_artifact.sh

Review comment:
       If python no longer relies on this then a number of exclusions in `create_build_artifact.sh` could probably be removed. Maybe even all of them.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12268:
URL: https://github.com/apache/flink/pull/12268#issuecomment-631512695


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "4ed6888375869e654816264124703e72439c6148",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1955",
       "triggerID" : "4ed6888375869e654816264124703e72439c6148",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 4ed6888375869e654816264124703e72439c6148 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1955) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
rmetzger commented on pull request #12268:
URL: https://github.com/apache/flink/pull/12268#issuecomment-633895820


   Thanks a lot for your comments & sorry for the back and forth.
   
   I believe I addressed all open comments.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on a change in pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #12268:
URL: https://github.com/apache/flink/pull/12268#discussion_r429918192



##########
File path: tools/ci/test_controller.sh
##########
@@ -0,0 +1,127 @@
+#!/usr/bin/env bash
+################################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+################################################################################
+
+#
+# This file contains generic control over the test execution.
+#
+
+HERE="`dirname \"$0\"`"             # relative
+HERE="`( cd \"$HERE\" && pwd )`"    # absolutized and normalized
+if [ -z "$HERE" ] ; then
+	exit 1
+fi
+
+source "${HERE}/stage.sh"
+source "${HERE}/maven-utils.sh"
+source "${HERE}/controller_utils.sh"
+source "${HERE}/watchdog.sh"
+TEST=$1
+
+# =============================================================================
+# Step 0: Check & print environment information & configure env
+# =============================================================================
+
+# check preconditions
+if [ -z "$DEBUG_FILES" ] ; then
+	echo "ERROR: Environment variable 'DEBUG_FILES' is not set but expected by test_controller.sh"
+	exit 1
+fi
+
+if [ ! -d "$DEBUG_FILES" ] ; then
+	echo "ERROR: Environment variable DEBUG_FILES=$DEBUG_FILES points to a directory that does not exist"
+	exit 1
+fi
+
+if [ -z "$TEST" ] ; then
+	echo "ERROR: Environment variable 'TEST' is not set but expected by test_controller.sh"
+	exit 1
+fi
+
+echo "Printing environment information"
+
+echo "PATH=$PATH"
+run_mvn -version
+echo "Commit: $(git rev-parse HEAD)"
+print_system_info
+
+# enable coredumps for this process
+ulimit -c unlimited
+
+# configure JVMs to produce heap dumps
+export JAVA_TOOL_OPTIONS="-XX:+HeapDumpOnOutOfMemoryError"
+
+# some tests provide additional logs if they find this variable
+export IS_CI=true
+
+# =============================================================================
+# Step 1: Compile Flink (again)
+# =============================================================================
+
+WATCHDOG_CALLBACK_ON_TIMEOUT="print_stacktraces | tee ${DEBUG_FILES}/jps-traces.out"
+
+LOG4J_PROPERTIES=${HERE}/log4j-ci.properties
+MVN_LOGGING_OPTIONS="-Dlog.dir=${DEBUG_FILES} -Dlog4j.configurationFile=file://$LOG4J_PROPERTIES"
+# Maven command to run. We set the forkCount manually, because otherwise Maven sees too many cores
+# on some CI environments. Set forkCountTestPackage to 1 for container-based environment (4 GiB memory)

Review comment:
       Will remove it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on a change in pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #12268:
URL: https://github.com/apache/flink/pull/12268#discussion_r429912828



##########
File path: tools/azure-pipelines/build-python-wheels.yml
##########
@@ -33,14 +32,13 @@ jobs:
         name: set_version
 
       - script: |
-          cd $(Pipeline.Workspace)
-          tar czf $(Pipeline.Workspace)/flink.tar.gz flink_cache/flink-dist/target/flink-$(set_version.VERSION)-bin/flink-$(set_version.VERSION)
+          tar czf $(Pipeline.Workspace)/flink.tar.gz flink-dist/target/flink-$(set_version.VERSION)-bin/flink-$(set_version.VERSION)
         displayName: Compress in tgz
 
       # upload artifacts for building wheels
       - task: PublishPipelineArtifact@1
         inputs:
-          path: $(FLINK_ARTIFACT_DIR)
+          targetPath: $(Pipeline.Workspace)/flink.tar.gz

Review comment:
       It seems that `targetPath` is the right input: https://docs.microsoft.com/en-us/azure/devops/pipelines/tasks/utility/publish-pipeline-artifact?view=azure-devops.
   I will update the template.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] rmetzger commented on a change in pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
rmetzger commented on a change in pull request #12268:
URL: https://github.com/apache/flink/pull/12268#discussion_r429917886



##########
File path: tools/ci/maven-utils.sh
##########
@@ -73,7 +73,7 @@ function collect_coredumps {
 	echo "Searching for .dump, .dumpstream and related files in '$SEARCHDIR'"
 	for file in `find $SEARCHDIR -type f -regextype posix-extended -iregex '.*\.hprof|.*\.dump|.*\.dumpstream|.*hs.*\.log|.*/core(.[0-9]+)?$'`; do
 		echo "Moving '$file' to target directory ('$TARGET_DIR')"
-		mv $file $TARGET_DIR/
+		mv $file $TARGET_DIR/$(echo $file | tr "/" "-")

Review comment:
       The problem I'm solving here is the following: Before this change, the YARN test profile has the following output:
   ```
   Moving '/__w/2/s/flink-python/target/surefire-reports/2020-05-25T09-59-37_808.dumpstream' to target directory ('/__w/2/s/tools/artifacts')
   Moving '/__w/2/s/flink-filesystems/flink-azure-fs-hadoop/target/surefire-reports/2020-05-25T09-59-37_808.dumpstream' to target directory ('/__w/2/s/tools/artifacts')
   Moving '/__w/2/s/flink-examples/flink-examples-streaming/target/surefire-reports/2020-05-25T09-59-37_808.dumpstream' to target directory ('/__w/2/s/tools/artifacts')
   ```
   
   Since all the files have the same name, we will only provide the last file to the user, not all of them.
   With this change, we are effectively renaming the file to contain the path, to make the filenames unique.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12268: [FLINK-17375] Refactor travis_watchdog.sh into separate ci and azure scripts.

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12268:
URL: https://github.com/apache/flink/pull/12268#issuecomment-631497626


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit 985fc536bcc3024468f50195fc3d341425a5b568 (Fri Oct 16 10:56:12 UTC 2020)
   
   **Warnings:**
    * Documentation files were touched, but no `.zh.md` files: Update Chinese documentation or file Jira ticket.
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org