You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Wei Cheng A (JIRA)" <ji...@apache.org> on 2019/03/04 06:04:00 UTC

[jira] [Commented] (BEAM-6707) TextIO.Write appear success but request not sent to Google Cloud Storage

    [ https://issues.apache.org/jira/browse/BEAM-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782994#comment-16782994 ] 

Wei Cheng A commented on BEAM-6707:
-----------------------------------

Hi Charles,

I'm not very familar with Apache Beam, hope I understand the logic correctly.

in FileBasedSink.java, rename() method is called with IGNORE_MISSING_FILES. The method may return without exception and proceed to delete the temporary files (removeTemporaryFiles method).
Is this working as intended?
https://github.com/apache/beam/blob/c96b096b77c324b886ab94aebcf320976002c0d4/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java#L767

{code:java}
FileSystems.rename(srcFiles, dstFiles, StandardMoveOptions.IGNORE_MISSING_FILES);
removeTemporaryFiles(srcFiles);
{code}


> TextIO.Write appear success but request not sent to Google Cloud Storage
> ------------------------------------------------------------------------
>
>                 Key: BEAM-6707
>                 URL: https://issues.apache.org/jira/browse/BEAM-6707
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>    Affects Versions: 2.8.0
>         Environment: Google Cloud Dataflow and Google Cloud Storage
>            Reporter: Wei Cheng A
>            Priority: Major
>
> Google Cloud Dataflow is being used to run Apache Beam job.
> from the Dataflow log, the file operation appear to be success.
> Will copy temporary file FileResult{tempFilename=gs://xxxxxx, shard=0, window=org.apache.beam.sdk.transforms.windowing.GlobalWindow@xxxxx, paneInfo=PaneInfo{isFirst=true, isLast=true, timing=ON_TIME, index=0, onTimeIndex=0}} to final location gs://xxxx/20190211.csv
> But when I checked GCS and its log, there was no put or post request during that time.
> This issue happened intermittently. The file is copied successfully sometimes, after retry.
> I have checked the relevant Beam source code 
> https://github.com/apache/beam/blob/c96b096b77c324b886ab94aebcf320976002c0d4/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java#L763
> and
> https://github.com/apache/beam/blob/c96b096b77c324b886ab94aebcf320976002c0d4/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java#L304
> seems like in rename() method, there are multiple conditions that the method would return without Exception and appear as "success" in log.
> Is there any bug in these Beam methods or I should check for error in my code?
> {code:java}
> TextIO.write().withoutSharding().to(options.getOutFilePath()));
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)