You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2021/07/07 17:21:01 UTC

[jira] [Updated] (BEAM-1309) FileIOChannelFactory.match() traverses entire parent directory recursively

     [ https://issues.apache.org/jira/browse/BEAM-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Beam JIRA Bot updated BEAM-1309:
--------------------------------
    Labels: Clarified  (was: Clarified stale-P2)

> FileIOChannelFactory.match() traverses entire parent directory recursively
> --------------------------------------------------------------------------
>
>                 Key: BEAM-1309
>                 URL: https://issues.apache.org/jira/browse/BEAM-1309
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Eugene Kirpichov
>            Priority: P3
>              Labels: Clarified
>
> I was running a pipeline that reads a single file from my local home directory.
> The pipeline got stuck, and upon taking a stack snapshot, I noticed that it was stuck in FileIOChannelFactory.match().
> The code currently works by traversing the whole parent directory of the requested filepattern and checking which files match the filepattern. In my case, that means traversing everything in my home directory, which is *a lot* (and includes remotely mounted directories).
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/FileIOChannelFactory.java#L109
> This is very wasteful and should be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)