You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Kenneth Knowles (Jira)" <ji...@apache.org> on 2022/01/14 19:33:00 UTC

[jira] [Commented] (BEAM-12771) InvalidPathException on Windows when removing staging directory

    [ https://issues.apache.org/jira/browse/BEAM-12771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476362#comment-17476362 ] 

Kenneth Knowles commented on BEAM-12771:
----------------------------------------

This code was written a _long_ time ago, and the project no longer has continuous Windows testing (that I am aware of).

My understanding of {{ResourceId}} is that it should not include any glob, but be essentially a file path denoting a file or blog on S3/GCS/etc. The expectation for resolving a wildcard would be to use {{Filesystems.match}}. I could be wrong. Obviously this could be inefficient if we want to do a wildcard delete.

> InvalidPathException on Windows when removing staging directory 
> ----------------------------------------------------------------
>
>                 Key: BEAM-12771
>                 URL: https://issues.apache.org/jira/browse/BEAM-12771
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-files, sdk-java-core
>    Affects Versions: 2.31.0
>         Environment: Windows
>            Reporter: Thomas Krause
>            Priority: P3
>
> When running the word count example on windows using e.g. the Flink runner an InvalidPathException is thrown when execution is finished. The message is:
>  
> {code:java}
> INFO:apache_beam.utils.subprocess_server:b'WARNUNG: Failed to remove job staging directory for token job_941a4c92-f66d-4d71-8b68-b16cd5026750.'
> INFO:apache_beam.utils.subprocess_server:b'java.nio.file.InvalidPathException: Illegal char <*> at index 136: C:\\Users\\Thomas\\AppData\\Local\\Temp\\beam-tempgkbgwplc\\artifactsuu86zi08\\d82c20307c13adfba9486c5a114a464cc3fb072ad60623a58839256a91a6c9f8\\*'
> INFO:apache_beam.utils.subprocess_server:b'\tat sun.nio.fs.WindowsPathParser.normalize(Unknown Source)'
> INFO:apache_beam.utils.subprocess_server:b'\tat sun.nio.fs.WindowsPathParser.parse(Unknown Source)'
> INFO:apache_beam.utils.subprocess_server:b'\tat sun.nio.fs.WindowsPathParser.parse(Unknown Source)'
> INFO:apache_beam.utils.subprocess_server:b'\tat sun.nio.fs.WindowsPath.parse(Unknown Source)'
> INFO:apache_beam.utils.subprocess_server:b'\tat sun.nio.fs.WindowsFileSystem.getPath(Unknown Source)'
> INFO:apache_beam.utils.subprocess_server:b'\tat java.nio.file.Paths.get(Unknown Source)'
> INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.beam.sdk.io.LocalResourceId.resolveLocalPathWindowsOS(LocalResourceId.java:103)'
> INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.beam.sdk.io.LocalResourceId.resolve(LocalResourceId.java:65)'
> INFO:apache_beam.runners.portability.portable_runner:Job state changed to DONE
> INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.beam.sdk.io.LocalResourceId.resolve(LocalResourceId.java:36)'
> INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService$1.removeStagedArtifacts(ArtifactStagingService.java:182)'
> INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService.removeStagedArtifacts(ArtifactStagingService.java:115)'
> INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.beam.runners.jobsubmission.JobServerDriver.lambda$createJobService$0(JobServerDriver.java:66)'
> INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.beam.runners.jobsubmission.InMemoryJobService.lambda$run$0(InMemoryJobService.java:261)'
> INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.beam.runners.jobsubmission.JobInvocation.setState(JobInvocation.java:249)'
> INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.beam.runners.jobsubmission.JobInvocation.access$200(JobInvocation.java:51)'
> INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.beam.runners.jobsubmission.JobInvocation$1.onSuccess(JobInvocation.java:115)'
> INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.beam.runners.jobsubmission.JobInvocation$1.onSuccess(JobInvocation.java:101)'
> INFO:apache_beam.utils.subprocess_server:b'\tat org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1058)'
> INFO:apache_beam.utils.subprocess_server:b'\tat java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)'
> INFO:apache_beam.utils.subprocess_server:b'\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)'
> INFO:apache_beam.utils.subprocess_server:b'\tat java.lang.Thread.run(Unknown Source)'{code}
> The error seems to be in this function:
> {code:java}
> private LocalResourceId resolveLocalPathWindowsOS(String other, ResolveOptions resolveOptions) {
>     String uuid = UUID.randomUUID().toString();
>     Path pathAsterisksReplaced = Paths.get(pathString.replaceAll("\\*", uuid));
>     String otherAsterisksReplaced = other.replaceAll("\\*", uuid);
>     return new LocalResourceId(
>         Paths.get(
>             pathAsterisksReplaced
>                 .resolve(otherAsterisksReplaced)
>                 .toString()
>                 .replaceAll(uuid, "\\*")),
>         resolveOptions.equals(StandardResolveOptions.RESOLVE_DIRECTORY));
>   }
> {code}
> Paths.get throws an exception since it does not support wildcards on windows. It seems the function already takes care of replaceing the wildcard with 'uuid' on the first call to Paths.get, but then in the return statement Paths.get is called again on a string where uuid is replaced with the wildcard again, which of course throws the exception.
> Unfortunately I don't really understand the logic of the function, so I'm not sure what the best fix would be.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)