You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2023/05/25 14:23:00 UTC

[jira] [Commented] (HADOOP-18753) S3AFileSystem doesn't consistently handle prefixes that are both files and directories between versions

    [ https://issues.apache.org/jira/browse/HADOOP-18753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17726243#comment-17726243 ] 

Steve Loughran commented on HADOOP-18753:
-----------------------------------------

you've just exceeded the envelope of "sustainable filesystem metaphor"; you can also create objects under objects if you try hard (i..e. create() doesn't check as it would add overhead).

lots of other things can go very wrong at this point too, such as directory rename and delete

wontfix I'm afraid; sorry

> S3AFileSystem doesn't consistently handle prefixes that are both files and directories between versions
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-18753
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18753
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 3.3.4
>            Reporter: Helen Weng
>            Priority: Major
>
> We have a prefix structure where the prefix Spark reads is both a file and a directory. So s3://a/b is the file we are trying to read, but s3://a/b/c is also a file. In 3.2.1, listStatuses identifies a/b as a File, but a change in 3.3.4 now identifies a/b as a directory and tries to read a/b/c instead of a/b.
> When s3GetFileStatus is called on the path with StatusProbeEnum HEAD, the path does return as "File". However innerListStatus first assumes that any prefix that is "nonempty" is a directory; it only calls s3GetFileStatus on empty directories and on listObjects results of the prefix.
> Wonder if this is known/if there are any suggestions to get around this without changing the prefix structure?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org