You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "XComp (via GitHub)" <gi...@apache.org> on 2024/01/31 14:05:14 UTC

[PR] [FLINK-34324][test] Makes all s3 related operations being declared and called in a single location [flink]

XComp opened a new pull request, #24244:
URL: https://github.com/apache/flink/pull/24244

   ## What is the purpose of the change
   
   * Fixes error message in test execution
   * unifies S3-related code in one location
   
   ## Brief change log
   
   * Moves the entire s3-related logic into the if block
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? not applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [FLINK-34324][test] Makes all s3 related operations being declared and called in a single location [flink]

Posted by "zentol (via GitHub)" <gi...@apache.org>.
zentol commented on code in PR #24244:
URL: https://github.com/apache/flink/pull/24244#discussion_r1474261856


##########
flink-end-to-end-tests/test-scripts/test_file_sink.sh:
##########
@@ -96,13 +51,54 @@ function get_complete_result {
 #   line number in part files
 ###################################
 function get_total_number_of_valid_lines {
-  if [ "${OUT_TYPE}" == "local" ]; then
-    get_complete_result | wc -l | tr -d '[:space:]'
-  elif [ "${OUT_TYPE}" == "s3" ]; then
-    s3_get_number_of_lines_by_prefix "${S3_PREFIX}" "part-"
-  fi
+  get_complete_result | wc -l | tr -d '[:space:]'
 }
 
+if [ "${OUT_TYPE}" == "s3" ]; then
+  source "$(dirname "$0")"/common_s3.sh
+  s3_setup hadoop
+
+  JOB_OUTPUT_PATH="s3://$IT_CASE_S3_BUCKET/$S3_PREFIX"
+  set_config_key "state.checkpoints.dir" "s3://$IT_CASE_S3_BUCKET/$S3_PREFIX-chk"
+  mkdir -p "$OUTPUT_PATH-chk"
+
+  # overwrites implementation for local runs
+  function get_complete_result {
+    s3_get_by_full_path_and_filename_prefix "$OUTPUT_PATH" "$S3_PREFIX" "part-" true
+  }

Review Comment:
   Or is that a no-op because in s3 mode we never write anything there...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [FLINK-34324][test] Makes all s3 related operations being declared and called in a single location [flink]

Posted by "flinkbot (via GitHub)" <gi...@apache.org>.
flinkbot commented on PR #24244:
URL: https://github.com/apache/flink/pull/24244#issuecomment-1919180871

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8bf09a9c3c67b9561ebe461f130f3aebb2eb908a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "8bf09a9c3c67b9561ebe461f130f3aebb2eb908a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8bf09a9c3c67b9561ebe461f130f3aebb2eb908a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [FLINK-34324][test] Makes all s3 related operations being declared and called in a single location [flink]

Posted by "zentol (via GitHub)" <gi...@apache.org>.
zentol commented on code in PR #24244:
URL: https://github.com/apache/flink/pull/24244#discussion_r1474268805


##########
flink-end-to-end-tests/test-scripts/test_file_sink.sh:
##########
@@ -96,13 +51,54 @@ function get_complete_result {
 #   line number in part files
 ###################################
 function get_total_number_of_valid_lines {
-  if [ "${OUT_TYPE}" == "local" ]; then
-    get_complete_result | wc -l | tr -d '[:space:]'
-  elif [ "${OUT_TYPE}" == "s3" ]; then
-    s3_get_number_of_lines_by_prefix "${S3_PREFIX}" "part-"
-  fi
+  get_complete_result | wc -l | tr -d '[:space:]'
 }
 
+if [ "${OUT_TYPE}" == "s3" ]; then
+  source "$(dirname "$0")"/common_s3.sh
+  s3_setup hadoop
+
+  JOB_OUTPUT_PATH="s3://$IT_CASE_S3_BUCKET/$S3_PREFIX"
+  set_config_key "state.checkpoints.dir" "s3://$IT_CASE_S3_BUCKET/$S3_PREFIX-chk"
+  mkdir -p "$OUTPUT_PATH-chk"
+
+  # overwrites implementation for local runs
+  function get_complete_result {
+    s3_get_by_full_path_and_filename_prefix "$OUTPUT_PATH" "$S3_PREFIX" "part-" true
+  }

Review Comment:
   I wonder why we are reading from OUTPUT_PATH and not JOB_OUTPUT_PATH 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [FLINK-34324][test] Makes all s3 related operations being declared and called in a single location [flink]

Posted by "zentol (via GitHub)" <gi...@apache.org>.
zentol commented on code in PR #24244:
URL: https://github.com/apache/flink/pull/24244#discussion_r1474255709


##########
flink-end-to-end-tests/test-scripts/test_file_sink.sh:
##########
@@ -96,13 +51,54 @@ function get_complete_result {
 #   line number in part files
 ###################################
 function get_total_number_of_valid_lines {
-  if [ "${OUT_TYPE}" == "local" ]; then
-    get_complete_result | wc -l | tr -d '[:space:]'
-  elif [ "${OUT_TYPE}" == "s3" ]; then
-    s3_get_number_of_lines_by_prefix "${S3_PREFIX}" "part-"
-  fi
+  get_complete_result | wc -l | tr -d '[:space:]'
 }
 
+if [ "${OUT_TYPE}" == "s3" ]; then
+  source "$(dirname "$0")"/common_s3.sh
+  s3_setup hadoop
+
+  JOB_OUTPUT_PATH="s3://$IT_CASE_S3_BUCKET/$S3_PREFIX"
+  set_config_key "state.checkpoints.dir" "s3://$IT_CASE_S3_BUCKET/$S3_PREFIX-chk"
+  mkdir -p "$OUTPUT_PATH-chk"
+
+  # overwrites implementation for local runs
+  function get_complete_result {
+    s3_get_by_full_path_and_filename_prefix "$OUTPUT_PATH" "$S3_PREFIX" "part-" true
+  }
+
+  # overwrites implementation for local runs
+  function get_total_number_of_valid_lines {
+    s3_get_number_of_lines_by_prefix "${S3_PREFIX}" "part-"
+  }
+
+  # make sure we delete the file at the end
+  function out_cleanup {
+    s3_delete_by_full_path_prefix "$S3_PREFIX"
+    s3_delete_by_full_path_prefix "${S3_PREFIX}-chk"
+    rollback_openssl_lib
+  }
+
+  on_exit out_cleanup
+elif [ "${OUT_TYPE}" == "local" ]; then

Review Comment:
   How about we use this in the top-most branch so that the local setup is closer together?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [FLINK-34324][test] Makes all s3 related operations being declared and called in a single location [flink]

Posted by "zentol (via GitHub)" <gi...@apache.org>.
zentol commented on code in PR #24244:
URL: https://github.com/apache/flink/pull/24244#discussion_r1474257337


##########
flink-end-to-end-tests/test-scripts/test_file_sink.sh:
##########
@@ -96,13 +51,54 @@ function get_complete_result {
 #   line number in part files
 ###################################
 function get_total_number_of_valid_lines {
-  if [ "${OUT_TYPE}" == "local" ]; then
-    get_complete_result | wc -l | tr -d '[:space:]'
-  elif [ "${OUT_TYPE}" == "s3" ]; then
-    s3_get_number_of_lines_by_prefix "${S3_PREFIX}" "part-"
-  fi
+  get_complete_result | wc -l | tr -d '[:space:]'
 }
 
+if [ "${OUT_TYPE}" == "s3" ]; then
+  source "$(dirname "$0")"/common_s3.sh
+  s3_setup hadoop
+
+  JOB_OUTPUT_PATH="s3://$IT_CASE_S3_BUCKET/$S3_PREFIX"
+  set_config_key "state.checkpoints.dir" "s3://$IT_CASE_S3_BUCKET/$S3_PREFIX-chk"
+  mkdir -p "$OUTPUT_PATH-chk"
+
+  # overwrites implementation for local runs
+  function get_complete_result {
+    s3_get_by_full_path_and_filename_prefix "$OUTPUT_PATH" "$S3_PREFIX" "part-" true
+  }

Review Comment:
   This is missing the final line from the original get_complete_result function?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [FLINK-34324][test] Makes all s3 related operations being declared and called in a single location [flink]

Posted by "XComp (via GitHub)" <gi...@apache.org>.
XComp closed pull request #24244: [FLINK-34324][test] Makes all s3 related operations being declared and called in a single location
URL: https://github.com/apache/flink/pull/24244


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [FLINK-34324][test] Makes all s3 related operations being declared and called in a single location [flink]

Posted by "XComp (via GitHub)" <gi...@apache.org>.
XComp commented on PR #24244:
URL: https://github.com/apache/flink/pull/24244#issuecomment-1983251954

   Closing this one becaue [there are still issues](https://issues.apache.org/jira/browse/FLINK-34324?focusedCommentId=17814891&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17814891).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [FLINK-34324][test] Makes all s3 related operations being declared and called in a single location [flink]

Posted by "XComp (via GitHub)" <gi...@apache.org>.
XComp commented on PR #24244:
URL: https://github.com/apache/flink/pull/24244#issuecomment-1944268824

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org