You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Kyle Weaver (Jira)" <ji...@apache.org> on 2021/02/10 02:41:00 UTC

[jira] [Updated] (BEAM-10261) [FileIO] Unexpected exception thrown when retrieving a GCS file with a space inside path

     [ https://issues.apache.org/jira/browse/BEAM-10261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kyle Weaver updated BEAM-10261:
-------------------------------
    Resolution: Fixed
        Status: Resolved  (was: Triage Needed)

I can confirm that this issue is fixed in Beam 2.26.0.

> [FileIO] Unexpected exception thrown when retrieving a GCS file with a space inside path
> ----------------------------------------------------------------------------------------
>
>                 Key: BEAM-10261
>                 URL: https://issues.apache.org/jira/browse/BEAM-10261
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>    Affects Versions: 2.20.0, 2.21.0, 2.22.0, 2.23.0, 2.24.0, 2.25.0
>         Environment: Google Cloud Dataflow
>            Reporter: Xavier HAUSHERR
>            Priority: P1
>              Labels: bug, gcs, java, storage
>             Fix For: 2.26.0
>
>
> Hi,
> I am using a PTransform class to retrieve Google Cloud Storage files with FileIO that were working very well before version 2.20.0. 
> I have upgraded my Beam library last week, to 2.20.0 & 2.21.0 and now I have an unexpected Exception when I retrieve some files with space inside the path:
> {code:java}
> Error message from worker: java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.io.FileNotFoundException: Item not found: 'gs://[MY_BUCKET]/2017/09/12/3d9d7cc8-e970-42f8-9f24-7d9b70989033/31/a9/ba/<1710RH600@optimashipbroking.com /body.txt'. If you enabled STRICT generation consistency, it is possible that the live version is still available but the intended generation is deleted. org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:184)
> {code}
>  
> Please note that the gcloud following gcloud command works:
> {code:bash}
> gsutil ls "gs://[MY_BUCKET]/2017/09/12/3d9d7cc8-e970-42f8-9f24-7d9b70989033/31/a9/ba/<1710RH600@optimashipbroking.com /body.txt"{code}
>  
> Here is my code:
> {code:java}
> public PCollection<KV<String, byte[]>> expand(PBegin begin) {
>     PCollection<KV<String, byte[]>> files = begin
> .apply(FileIO.match().filepattern("gs://[MY_BUCKET]/**/body.txt").withEmptyMatchTreatment(EmptyMatchTreatment.ALLOW))
>         .apply(FileIO.readMatches())
>         .apply("Extract key",
>             ParDo.of(
>                 new DoFn<ReadableFile, KV<String, byte[]>>() {
>                     @ProcessElement
>                     public void processElement(ProcessContext c) throws IOException {
>                         ReadableFile f = c.element();
>                         c.output(KV.of(f.getMetadata().resourceId().toString(), f.readFullyAsBytes()));
>                     }
>                 }
>             )
>         );
>     return files;
> }
> {code}
>  
> Maybe I just need to find a way to escape the file path but I don't know how.
>  
> I hope you can help me. 
>  
> Xavier
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)