You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Wei Cheng A (JIRA)" <ji...@apache.org> on 2019/03/04 06:04:00 UTC
[jira] [Commented] (BEAM-6707) TextIO.Write appear success but
request not sent to Google Cloud Storage
[ https://issues.apache.org/jira/browse/BEAM-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782994#comment-16782994 ]
Wei Cheng A commented on BEAM-6707:
-----------------------------------
Hi Charles,
I'm not very familar with Apache Beam, hope I understand the logic correctly.
in FileBasedSink.java, rename() method is called with IGNORE_MISSING_FILES. The method may return without exception and proceed to delete the temporary files (removeTemporaryFiles method).
Is this working as intended?
https://github.com/apache/beam/blob/c96b096b77c324b886ab94aebcf320976002c0d4/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java#L767
{code:java}
FileSystems.rename(srcFiles, dstFiles, StandardMoveOptions.IGNORE_MISSING_FILES);
removeTemporaryFiles(srcFiles);
{code}
> TextIO.Write appear success but request not sent to Google Cloud Storage
> ------------------------------------------------------------------------
>
> Key: BEAM-6707
> URL: https://issues.apache.org/jira/browse/BEAM-6707
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp
> Affects Versions: 2.8.0
> Environment: Google Cloud Dataflow and Google Cloud Storage
> Reporter: Wei Cheng A
> Priority: Major
>
> Google Cloud Dataflow is being used to run Apache Beam job.
> from the Dataflow log, the file operation appear to be success.
> Will copy temporary file FileResult{tempFilename=gs://xxxxxx, shard=0, window=org.apache.beam.sdk.transforms.windowing.GlobalWindow@xxxxx, paneInfo=PaneInfo{isFirst=true, isLast=true, timing=ON_TIME, index=0, onTimeIndex=0}} to final location gs://xxxx/20190211.csv
> But when I checked GCS and its log, there was no put or post request during that time.
> This issue happened intermittently. The file is copied successfully sometimes, after retry.
> I have checked the relevant Beam source code
> https://github.com/apache/beam/blob/c96b096b77c324b886ab94aebcf320976002c0d4/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java#L763
> and
> https://github.com/apache/beam/blob/c96b096b77c324b886ab94aebcf320976002c0d4/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java#L304
> seems like in rename() method, there are multiple conditions that the method would return without Exception and appear as "success" in log.
> Is there any bug in these Beam methods or I should check for error in my code?
> {code:java}
> TextIO.write().withoutSharding().to(options.getOutFilePath()));
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)