You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "David (Jira)" <ji...@apache.org> on 2021/10/06 14:24:00 UTC

[jira] [Created] (BEAM-13010) Delete orphaned files

David created BEAM-13010:
----------------------------

             Summary: Delete orphaned files
                 Key: BEAM-13010
                 URL: https://issues.apache.org/jira/browse/BEAM-13010
             Project: Beam
          Issue Type: Bug
          Components: io-py-files
    Affects Versions: 2.34.0
            Reporter: David
             Fix For: 2.35.0


Until version 2.33.0 of Apache Beam, (tested with a Python streaming pipeline consuming events from PubSub and writing them into GCS), some files were being deleted from the temporary folder before being moved to the destination. This was the original issue: 

https://issues.apache.org/jira/browse/BEAM-12950

In version 2.34.0 we applied a temporary workaround to be sure that no data is dropped. Instead of deleting the orphaned files, we just log them:

[https://github.com/apache/beam/pull/15576]

Most probably the root cause of the missing event was that we were removing files at an erroneous time. We need to delete orphaned files in a subsequent step (after we're sure that there won't be retries). 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)