You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/08/10 13:31:20 UTC

[GitHub] [beam] scwhittle opened a new pull request #15301: Change renames to handle retries internally

scwhittle opened a new pull request #15301:
URL: https://github.com/apache/beam/pull/15301


   **Please** add a meaningful description for your change here
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   `ValidatesRunner` compliance status (on master branch)
   --------------------------------------------------------
   
   <table>
     <thead>
       <tr>
         <th>Lang</th>
         <th>ULR</th>
         <th>Dataflow</th>
         <th>Flink</th>
         <th>Samza</th>
         <th>Spark</th>
         <th>Twister2</th>
       </tr>
     </thead>
     <tbody>
       <tr>
         <td>Go</td>
         <td>---</td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon">
           </a>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Samza/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Samza/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>---</td>
       </tr>
       <tr>
         <td>Java</td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_ULR/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_ULR/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon?subject=V1">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Streaming/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Streaming/lastCompletedBuild/badge/icon?subject=V1+Streaming">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon?subject=V1+Java+11">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2/lastCompletedBuild/badge/icon?subject=V2">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2_Streaming/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_VR_Dataflow_V2_Streaming/lastCompletedBuild/badge/icon?subject=V2+Streaming">
           </a><br>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon?subject=Java+8">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon?subject=Java+11">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon?subject=Portable">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon?subject=Portable+Streaming">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Samza/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Samza/lastCompletedBuild/badge/icon?subject=Portable">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon?subject=Portable">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon?subject=Structured+Streaming">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon">
           </a>
         </td>
       </tr>
       <tr>
         <td>Python</td>
         <td>---</td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon?subject=V1">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon?subject=V2">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon?subject=ValCont">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon?subject=Portable">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Samza/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Samza/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>---</td>
       </tr>
       <tr>
         <td>XLang</td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Dataflow/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>---</td>
       </tr>
     </tbody>
   </table>
   
   Examples testing status on various runners
   --------------------------------------------------------
   
   <table>
     <thead>
       <tr>
         <th>Lang</th>
         <th>ULR</th>
         <th>Dataflow</th>
         <th>Flink</th>
         <th>Samza</th>
         <th>Spark</th>
         <th>Twister2</th>
       </tr>
     </thead>
     <tbody>
       <tr>
         <td>Go</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
       </tr>
       <tr>
         <td>Java</td>
         <td>---</td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/lastCompletedBuild/badge/icon?subject=V1">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Java11_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Java11_Cron/lastCompletedBuild/badge/icon?subject=V1+Java11">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Dataflow_V2/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java_Examples_Dataflow_V2/lastCompletedBuild/badge/icon?subject=V2">
           </a><br>
         </td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
       </tr>
       <tr>
         <td>Python</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
       </tr>
       <tr>
         <td>XLang</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
       </tr>
     </tbody>
   </table>
   
   Post-Commit SDK/Transform Integration Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   <table>
     <thead>
       <tr>
         <th>Go</th>
         <th>Java</th>
         <th>Python</th>
       </tr>
     </thead>
     <tbody>
       <tr>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon?subject=3.6">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon?subject=3.7">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon?subject=3.8">
           </a>
         </td>
       </tr>
     </tbody>
   </table>
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   <table>
     <thead>
       <tr>
         <th>---</th>
         <th>Java</th>
         <th>Python</th>
         <th>Go</th>
         <th>Website</th>
         <th>Whitespace</th>
         <th>Typescript</th>
       </tr>
     </thead>
     <tbody>
       <tr>
         <td>Non-portable</td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon">
           </a><br>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon?subject=Tests">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon?subject=Lint">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/badge/icon?subject=Docker">
           </a><br>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_PythonDocs_Cron/badge/icon?subject=Docs">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_Whitespace_Cron/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_Typescript_Cron/lastCompletedBuild/badge/icon">
           </a>
         </td>
       </tr>
       <tr>
         <td>Portable</td>
         <td>---</td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>
           <a href="https://ci-beam.apache.org/job/beam_PreCommit_GoPortable_Cron/lastCompletedBuild/">
             <img alt="Build Status" src="https://ci-beam.apache.org/job/beam_PreCommit_GoPortable_Cron/lastCompletedBuild/badge/icon">
           </a>
         </td>
         <td>---</td>
         <td>---</td>
         <td>---</td>
       </tr>
     </tbody>
   </table>
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on pull request #15301: [BEAM-12740] [BEAM-8268] Improve error handling and retries for GCS rename used by FileBasedSink

Posted by GitBox <gi...@apache.org>.
scwhittle commented on pull request #15301:
URL: https://github.com/apache/beam/pull/15301#issuecomment-899506458


   Run Java_Examples_Dataflow_Java11 PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on pull request #15301: [BEAM-12740] [BEAM-8268] Improve error handling and retries for GCS rename used by FileBasedSink

Posted by GitBox <gi...@apache.org>.
scwhittle commented on pull request #15301:
URL: https://github.com/apache/beam/pull/15301#issuecomment-897515410


   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on a change in pull request #15301: [BEAM-12740] [BEAM-8268] Improve error handling and retries for GCS rename used by FileBasedSink

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on a change in pull request #15301:
URL: https://github.com/apache/beam/pull/15301#discussion_r687082094



##########
File path: sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystem.java
##########
@@ -545,7 +546,9 @@ CompleteMultipartUploadResult multipartCopy(
 
   @Override
   protected void rename(
-      List<S3ResourceId> sourceResourceIds, List<S3ResourceId> destinationResourceIds)
+      List<S3ResourceId> sourceResourceIds,
+      List<S3ResourceId> destinationResourceIds,
+      MoveOptions... moveOptions)

Review comment:
       We should raise an UnsupportedOperationexception for this and other FileSystems when MoveOptions that are not fully supported are used.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on pull request #15301: [BEAM-12740] [BEAM-8268] Improve error handling and retries for GCS rename used by FileBasedSink

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on pull request #15301:
URL: https://github.com/apache/beam/pull/15301#issuecomment-901309073


   Thanks. Yeah, we can merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on pull request #15301: [BEAM-12740] [BEAM-8268] Improve error handling and retries for GCS rename used by FileBasedSink

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on pull request #15301:
URL: https://github.com/apache/beam/pull/15301#issuecomment-897067405


   Not sure why word count failed, but re-trying filed tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on pull request #15301: [BEAM-12740] [BEAM-8268] Improve error handling and retries for GCS rename used by FileBasedSink

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on pull request #15301:
URL: https://github.com/apache/beam/pull/15301#issuecomment-897109789


   LGTM. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on pull request #15301: [BEAM-12740] [BEAM-8268] Improve error handling and retries for GCS rename used by FileBasedSink

Posted by GitBox <gi...@apache.org>.
scwhittle commented on pull request #15301:
URL: https://github.com/apache/beam/pull/15301#issuecomment-897515648






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on pull request #15301: [BEAM-12740] [BEAM-8268] Improve error handling and retries for GCS rename used by FileBasedSink

Posted by GitBox <gi...@apache.org>.
scwhittle commented on pull request #15301:
URL: https://github.com/apache/beam/pull/15301#issuecomment-899488723


   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on pull request #15301: [BEAM-12740] [BEAM-8268] Improve error handling and retries for GCS rename used by FileBasedSink

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on pull request #15301:
URL: https://github.com/apache/beam/pull/15301#issuecomment-897386107


   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on pull request #15301: [BEAM-12740] [BEAM-8268] Improve error handling and retries for GCS rename used by FileBasedSink

Posted by GitBox <gi...@apache.org>.
scwhittle commented on pull request #15301:
URL: https://github.com/apache/beam/pull/15301#issuecomment-896835108


   @reuvenlax @chamikaramj 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on pull request #15301: [BEAM-12740] [BEAM-8268] Improve error handling and retries for GCS rename used by FileBasedSink

Posted by GitBox <gi...@apache.org>.
scwhittle commented on pull request #15301:
URL: https://github.com/apache/beam/pull/15301#issuecomment-900929475


   @chamikaramj @lukecwik Tests are passing now, I think this is ready to merge 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on a change in pull request #15301: Change renames to handle retries internally

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on a change in pull request #15301:
URL: https://github.com/apache/beam/pull/15301#discussion_r686189167



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
##########
@@ -770,12 +770,10 @@ final void moveToOutputFiles(
       }
       // During a failure case, files may have been deleted in an earlier step. Thus
       // we ignore missing files here.
-      FileSystems.rename(
-          srcFiles,
-          dstFiles,
-          StandardMoveOptions.IGNORE_MISSING_FILES,
-          StandardMoveOptions.SKIP_IF_DESTINATION_EXISTS);
-      removeTemporaryFiles(srcFiles);

Review comment:
       I think we still need removeTemporaryFiles() to remove the temporary directory for batch.

##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
##########
@@ -770,12 +770,10 @@ final void moveToOutputFiles(
       }
       // During a failure case, files may have been deleted in an earlier step. Thus
       // we ignore missing files here.
-      FileSystems.rename(
-          srcFiles,
-          dstFiles,
-          StandardMoveOptions.IGNORE_MISSING_FILES,

Review comment:
       We should make sure that not passing these options do not end up being a regression for other file systems (S3, HDFS, local, etc.)

##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java
##########
@@ -317,6 +317,10 @@ public static void rename(
       return;
     }
 
+    // XXX: Instead of filtering here we should pass options on to filesystem as it will be able to

Review comment:
       This is a TODO ?

##########
File path: sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java
##########
@@ -681,21 +734,77 @@ public void onSuccess(RewriteResponse rewriteResponse, HttpHeaders responseHeade
 
     @Override
     public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders) throws IOException {
-      readyToEnqueue = false;
-      throw new IOException(String.format("Error trying to rewrite %s to %s: %s", from, to, e));
+      if (e.getCode() == HttpStatusCodes.STATUS_CODE_NOT_FOUND) {
+        if (move) {
+          // Treat the missing source as a successful move. We don't verify the destination file
+          // exists as it may have subsequently been moved by something else.
+          readyToEnqueue = false;
+          lastError = null;
+        } else {
+          throw new FileNotFoundException(from.toString());
+        }
+      } else {
+        lastError = e;
+        readyToEnqueue = true;
+      }
     }
   }
 
   public void copy(Iterable<String> srcFilenames, Iterable<String> destFilenames)
       throws IOException {
-    LinkedList<RewriteOp> rewrites = makeRewriteOps(srcFilenames, destFilenames);
-    while (rewrites.size() > 0) {
-      executeBatches(makeCopyBatches(rewrites));
+    rewrite(srcFilenames, destFilenames, false);
+  }
+
+  public void rename(Iterable<String> srcFilenames, Iterable<String> destFilenames)
+      throws IOException {
+    rewrite(srcFilenames, destFilenames, true);
+  }
+
+  private void rewrite(Iterable<String> srcFilenames, Iterable<String> destFilenames, boolean move)
+      throws IOException {
+    LinkedList<RewriteOp> rewrites = makeRewriteOps(srcFilenames, destFilenames, move);
+    org.apache.beam.sdk.util.BackOff backoff = BACKOFF_FACTORY.backoff();
+    while (true) {
+      List<BatchRequest> batches = makeCopyBatches(rewrites); // Removes completed rewrite ops.
+      if (batches.isEmpty()) {
+        break;
+      }
+      RewriteOp sampleErrorOp =
+          rewrites.stream().filter(op -> op.getLastError() != null).findFirst().orElse(null);

Review comment:
       Should we just skip 404 failures here ? If we ignore other errors and end up not copying some files it could lead to data loss.

##########
File path: sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java
##########
@@ -625,29 +625,76 @@ private static void executeBatches(List<BatchRequest> batches) throws IOExceptio
    * round of enqueue() and execute is required. Repeat until getReadyToEnqueue() returns false.
    */
   class RewriteOp extends JsonBatchCallback<RewriteResponse> {
-    private GcsPath from;
-    private GcsPath to;
+    private final GcsPath from;
+    private final GcsPath to;
+    private final boolean move;
     private boolean readyToEnqueue;
+    private boolean performDelete;
+    private GoogleJsonError lastError;
     @VisibleForTesting Storage.Objects.Rewrite rewriteRequest;
 
     public boolean getReadyToEnqueue() {
       return readyToEnqueue;
     }
 
+    public GoogleJsonError getLastError() {
+      return lastError;
+    }
+
+    public GcsPath getFrom() {
+      return from;
+    }
+
+    public GcsPath getTo() {
+      return to;
+    }
+
     public void enqueue(BatchRequest batch) throws IOException {
       if (!readyToEnqueue) {
         throw new IOException(
             String.format(
                 "Invalid state for Rewrite, from=%s, to=%s, readyToEnqueue=%s",
                 from, to, readyToEnqueue));
       }
-      rewriteRequest.queue(batch, this);
-      readyToEnqueue = false;
+      LOG.info("XXX enqueue op");
+      if (performDelete) {
+        Storage.Objects.Delete deleteRequest =
+            storageClient.objects().delete(from.getBucket(), from.getObject());
+        deleteRequest.queue(
+            batch,
+            new JsonBatchCallback<Void>() {
+              @Override
+              public void onSuccess(Void obj, HttpHeaders responseHeaders) {
+                LOG.debug("Successfully deleted {} after moving to {}", from, to);
+                readyToEnqueue = false;
+                lastError = null;
+              }
+
+              @Override
+              public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders)
+                  throws IOException {
+                if (e.getCode() == 404) {
+                  LOG.info(
+                      "Ignoring failed deletion of moved file {} which already does not exist: {}",
+                      from,
+                      e);
+                  readyToEnqueue = false;
+                  lastError = null;
+                } else {
+                  readyToEnqueue = true;
+                  lastError = e;
+                }
+              }
+            });
+      } else {
+        rewriteRequest.queue(batch, this);
+      }
     }
 
-    public RewriteOp(GcsPath from, GcsPath to) throws IOException {
+    public RewriteOp(GcsPath from, GcsPath to, boolean move) throws IOException {

Review comment:
       I believe rewrite means copy, so this option muddles the semantics a bit.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on pull request #15301: [BEAM-12740] [BEAM-8268] Improve error handling and retries for GCS rename used by FileBasedSink

Posted by GitBox <gi...@apache.org>.
scwhittle commented on pull request #15301:
URL: https://github.com/apache/beam/pull/15301#issuecomment-896834893


   PTAL, I removed the hacks for testing since the performance seemed greatly improved.
   Additionally I removed the unnecessary delete of renamed files in FileBasedSink by improving FileSystems.remove to always delete src files (even if the copy is skipped due to dest existing and that being filtered).
   I added a unit test for rename by adding support to inject a batch object. The apiary BatchRequest object is difficult to mock so I opted for mocking above that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on pull request #15301: [BEAM-12740] [BEAM-8268] Improve error handling and retries for GCS rename used by FileBasedSink

Posted by GitBox <gi...@apache.org>.
scwhittle commented on pull request #15301:
URL: https://github.com/apache/beam/pull/15301#issuecomment-899506199


   Run Java_Examples_Dataflow PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on a change in pull request #15301: Change renames to handle retries internally

Posted by GitBox <gi...@apache.org>.
scwhittle commented on a change in pull request #15301:
URL: https://github.com/apache/beam/pull/15301#discussion_r686252071



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
##########
@@ -770,12 +770,10 @@ final void moveToOutputFiles(
       }
       // During a failure case, files may have been deleted in an earlier step. Thus
       // we ignore missing files here.
-      FileSystems.rename(
-          srcFiles,
-          dstFiles,
-          StandardMoveOptions.IGNORE_MISSING_FILES,
-          StandardMoveOptions.SKIP_IF_DESTINATION_EXISTS);
-      removeTemporaryFiles(srcFiles);

Review comment:
       adding back, I think we need this as well if we are filtering if the destination exists because we still need to remove the source in that case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on a change in pull request #15301: Change renames to handle retries internally

Posted by GitBox <gi...@apache.org>.
scwhittle commented on a change in pull request #15301:
URL: https://github.com/apache/beam/pull/15301#discussion_r686250554



##########
File path: sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java
##########
@@ -681,21 +734,77 @@ public void onSuccess(RewriteResponse rewriteResponse, HttpHeaders responseHeade
 
     @Override
     public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders) throws IOException {
-      readyToEnqueue = false;
-      throw new IOException(String.format("Error trying to rewrite %s to %s: %s", from, to, e));
+      if (e.getCode() == HttpStatusCodes.STATUS_CODE_NOT_FOUND) {
+        if (move) {
+          // Treat the missing source as a successful move. We don't verify the destination file
+          // exists as it may have subsequently been moved by something else.
+          readyToEnqueue = false;
+          lastError = null;
+        } else {
+          throw new FileNotFoundException(from.toString());
+        }
+      } else {
+        lastError = e;
+        readyToEnqueue = true;
+      }
     }
   }
 
   public void copy(Iterable<String> srcFilenames, Iterable<String> destFilenames)
       throws IOException {
-    LinkedList<RewriteOp> rewrites = makeRewriteOps(srcFilenames, destFilenames);
-    while (rewrites.size() > 0) {
-      executeBatches(makeCopyBatches(rewrites));
+    rewrite(srcFilenames, destFilenames, false);
+  }
+
+  public void rename(Iterable<String> srcFilenames, Iterable<String> destFilenames)
+      throws IOException {
+    rewrite(srcFilenames, destFilenames, true);
+  }
+
+  private void rewrite(Iterable<String> srcFilenames, Iterable<String> destFilenames, boolean move)
+      throws IOException {
+    LinkedList<RewriteOp> rewrites = makeRewriteOps(srcFilenames, destFilenames, move);
+    org.apache.beam.sdk.util.BackOff backoff = BACKOFF_FACTORY.backoff();
+    while (true) {
+      List<BatchRequest> batches = makeCopyBatches(rewrites); // Removes completed rewrite ops.
+      if (batches.isEmpty()) {
+        break;
+      }
+      RewriteOp sampleErrorOp =
+          rewrites.stream().filter(op -> op.getLastError() != null).findFirst().orElse(null);

Review comment:
       we skip the 404 errors in the per-op handling above, so the error here is an error we want to retry
   
   we are not skipping, we either retry or once we hit the backoff limit we propagate one of the errors




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on a change in pull request #15301: Change renames to handle retries internally

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on a change in pull request #15301:
URL: https://github.com/apache/beam/pull/15301#discussion_r686380688



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
##########
@@ -770,12 +770,10 @@ final void moveToOutputFiles(
       }
       // During a failure case, files may have been deleted in an earlier step. Thus
       // we ignore missing files here.
-      FileSystems.rename(
-          srcFiles,
-          dstFiles,
-          StandardMoveOptions.IGNORE_MISSING_FILES,
-          StandardMoveOptions.SKIP_IF_DESTINATION_EXISTS);
-      removeTemporaryFiles(srcFiles);

Review comment:
       Might make sense to update the semantics of  "StandardMoveOptions.SKIP_IF_DESTINATION_EXISTS" to delete the source during rename. That will allow us to prevent the double delete for the case where the source existed.
   
   Also probably removeTemporaryFiles() should be updated to just cleanup the temporary directory (where appropriate) instead of trying to delete already renamed files. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on a change in pull request #15301: Change renames to handle retries internally

Posted by GitBox <gi...@apache.org>.
scwhittle commented on a change in pull request #15301:
URL: https://github.com/apache/beam/pull/15301#discussion_r686811469



##########
File path: sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java
##########
@@ -625,29 +625,76 @@ private static void executeBatches(List<BatchRequest> batches) throws IOExceptio
    * round of enqueue() and execute is required. Repeat until getReadyToEnqueue() returns false.
    */
   class RewriteOp extends JsonBatchCallback<RewriteResponse> {
-    private GcsPath from;
-    private GcsPath to;
+    private final GcsPath from;
+    private final GcsPath to;
+    private final boolean move;
     private boolean readyToEnqueue;
+    private boolean performDelete;
+    private GoogleJsonError lastError;
     @VisibleForTesting Storage.Objects.Rewrite rewriteRequest;
 
     public boolean getReadyToEnqueue() {
       return readyToEnqueue;
     }
 
+    public GoogleJsonError getLastError() {
+      return lastError;
+    }
+
+    public GcsPath getFrom() {
+      return from;
+    }
+
+    public GcsPath getTo() {
+      return to;
+    }
+
     public void enqueue(BatchRequest batch) throws IOException {
       if (!readyToEnqueue) {
         throw new IOException(
             String.format(
                 "Invalid state for Rewrite, from=%s, to=%s, readyToEnqueue=%s",
                 from, to, readyToEnqueue));
       }
-      rewriteRequest.queue(batch, this);
-      readyToEnqueue = false;
+      LOG.info("XXX enqueue op");
+      if (performDelete) {
+        Storage.Objects.Delete deleteRequest =
+            storageClient.objects().delete(from.getBucket(), from.getObject());
+        deleteRequest.queue(
+            batch,
+            new JsonBatchCallback<Void>() {
+              @Override
+              public void onSuccess(Void obj, HttpHeaders responseHeaders) {
+                LOG.debug("Successfully deleted {} after moving to {}", from, to);
+                readyToEnqueue = false;
+                lastError = null;
+              }
+
+              @Override
+              public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders)
+                  throws IOException {
+                if (e.getCode() == 404) {
+                  LOG.info(
+                      "Ignoring failed deletion of moved file {} which already does not exist: {}",
+                      from,
+                      e);
+                  readyToEnqueue = false;
+                  lastError = null;
+                } else {
+                  readyToEnqueue = true;
+                  lastError = e;
+                }
+              }
+            });
+      } else {
+        rewriteRequest.queue(batch, this);
+      }
     }
 
-    public RewriteOp(GcsPath from, GcsPath to) throws IOException {
+    public RewriteOp(GcsPath from, GcsPath to, boolean move) throws IOException {

Review comment:
       renamed the option to make what it does clearer. It is convenient to have a single op shared by copy and rename.  Let me know if you have other naming ideas.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on a change in pull request #15301: Change renames to handle retries internally

Posted by GitBox <gi...@apache.org>.
scwhittle commented on a change in pull request #15301:
URL: https://github.com/apache/beam/pull/15301#discussion_r686810614



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java
##########
@@ -317,6 +317,10 @@ public static void rename(
       return;
     }
 
+    // XXX: Instead of filtering here we should pass options on to filesystem as it will be able to

Review comment:
       Fixed all XXXs




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on a change in pull request #15301: Change renames to handle retries internally

Posted by GitBox <gi...@apache.org>.
scwhittle commented on a change in pull request #15301:
URL: https://github.com/apache/beam/pull/15301#discussion_r686656470



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
##########
@@ -770,12 +770,10 @@ final void moveToOutputFiles(
       }
       // During a failure case, files may have been deleted in an earlier step. Thus
       // we ignore missing files here.
-      FileSystems.rename(
-          srcFiles,
-          dstFiles,
-          StandardMoveOptions.IGNORE_MISSING_FILES,
-          StandardMoveOptions.SKIP_IF_DESTINATION_EXISTS);
-      removeTemporaryFiles(srcFiles);

Review comment:
       I modified FileSystems.rename to delete srcs that existed but were filtered due to dest existing.
   I kept the existing methods in FileBasedSink because it appears they are designed to be called by subclasses. I changed to pass an empty set for known files after the rename to avoid the unnecessary delete.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on pull request #15301: [BEAM-12740] [BEAM-8268] Improve error handling and retries for GCS rename used by FileBasedSink

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on pull request #15301:
URL: https://github.com/apache/beam/pull/15301#issuecomment-897061512


   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on pull request #15301: [BEAM-12740] [BEAM-8268] Improve error handling and retries for GCS rename used by FileBasedSink

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on pull request #15301:
URL: https://github.com/apache/beam/pull/15301#issuecomment-897066971


   Run Java_Examples_Dataflow_Java11 PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj merged pull request #15301: [BEAM-12740] [BEAM-8268] Improve error handling and retries for GCS rename used by FileBasedSink

Posted by GitBox <gi...@apache.org>.
chamikaramj merged pull request #15301:
URL: https://github.com/apache/beam/pull/15301


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on pull request #15301: [BEAM-12740] [BEAM-8268] Improve error handling and retries for GCS rename used by FileBasedSink

Posted by GitBox <gi...@apache.org>.
scwhittle commented on pull request #15301:
URL: https://github.com/apache/beam/pull/15301#issuecomment-900929475


   @chamikaramj @lukecwik Tests are passing now, I think this is ready to merge 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] chamikaramj commented on pull request #15301: [BEAM-12740] [BEAM-8268] Improve error handling and retries for GCS rename used by FileBasedSink

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on pull request #15301:
URL: https://github.com/apache/beam/pull/15301#issuecomment-897066561


   Run Java_Examples_Dataflow PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on pull request #15301: [BEAM-12740] [BEAM-8268] Improve error handling and retries for GCS rename used by FileBasedSink

Posted by GitBox <gi...@apache.org>.
lukecwik commented on pull request #15301:
URL: https://github.com/apache/beam/pull/15301#issuecomment-899735845






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on a change in pull request #15301: [BEAM-12740] [BEAM-8268] Improve error handling and retries for GCS rename used by FileBasedSink

Posted by GitBox <gi...@apache.org>.
scwhittle commented on a change in pull request #15301:
URL: https://github.com/apache/beam/pull/15301#discussion_r687127247



##########
File path: sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/S3FileSystem.java
##########
@@ -545,7 +546,9 @@ CompleteMultipartUploadResult multipartCopy(
 
   @Override
   protected void rename(
-      List<S3ResourceId> sourceResourceIds, List<S3ResourceId> destinationResourceIds)
+      List<S3ResourceId> sourceResourceIds,
+      List<S3ResourceId> destinationResourceIds,
+      MoveOptions... moveOptions)

Review comment:
       Done, changed FileSystems to catch the exception and retry without move options to keep existing behavior for filesystems that don't support options.
   
   I also fixed bug where I wasn't forwarding on the move options as I thought to the filesystem in FileSystems.rename ( and thus wasn't using the improved error handling in GcsFileSystem)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on a change in pull request #15301: Change renames to handle retries internally

Posted by GitBox <gi...@apache.org>.
scwhittle commented on a change in pull request #15301:
URL: https://github.com/apache/beam/pull/15301#discussion_r686251237



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java
##########
@@ -317,6 +317,10 @@ public static void rename(
       return;
     }
 
+    // XXX: Instead of filtering here we should pass options on to filesystem as it will be able to

Review comment:
       I'm leaving XXX for things I don't want to submit
   
   I am pushing this for a preliminary review that it doesn't seem broken and then so we can load test with it.  It definitely has warts that need to be addressed before submiting but we can do that if it turns out to be beneficial.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] scwhittle commented on a change in pull request #15301: Change renames to handle retries internally

Posted by GitBox <gi...@apache.org>.
scwhittle commented on a change in pull request #15301:
URL: https://github.com/apache/beam/pull/15301#discussion_r686251628



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java
##########
@@ -770,12 +770,10 @@ final void moveToOutputFiles(
       }
       // During a failure case, files may have been deleted in an earlier step. Thus
       // we ignore missing files here.
-      FileSystems.rename(
-          srcFiles,
-          dstFiles,
-          StandardMoveOptions.IGNORE_MISSING_FILES,

Review comment:
       Moved back, as get operations have different quota than writes the filtering might be beneficial.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org