You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 22:33:55 UTC

[GitHub] [beam] damccorm opened a new issue, #21269: Delete orphaned files

damccorm opened a new issue, #21269:
URL: https://github.com/apache/beam/issues/21269

   Until version 2.33.0 of Apache Beam, (tested with a Python streaming pipeline consuming events from PubSub and writing them into GCS), some files were being deleted from the temporary folder before being moved to the destination. This was the original issue: 
   
   https://issues.apache.org/jira/browse/BEAM-12950
   
   In version 2.34.0 we applied a temporary workaround to be sure that no data is dropped. Instead of deleting the orphaned files, we just log them:
   
   [https://github.com/apache/beam/pull/15576](https://github.com/apache/beam/pull/15576)
   
   Most probably the root cause of the missing event was that we were removing files at an erroneous time. We need to delete orphaned files in a subsequent step (after we're sure that there won't be retries). 
   
   Once the original issue is fixed and the orphaned files are deleted at the correct time, we should remove the decorator of the unit test skipped in the Pull Request above.  
   
   Imported from Jira [BEAM-13010](https://issues.apache.org/jira/browse/BEAM-13010). Original Jira may contain additional context.
   Reported by: davidpr.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org