You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Harshavardhana (Jira)" <ji...@apache.org> on 2021/11/24 06:04:00 UTC

[jira] [Comment Edited] (HADOOP-18019) S3AFileSystem.s3GetFileStatus() doesn't find dir markers on minio

    [ https://issues.apache.org/jira/browse/HADOOP-18019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448375#comment-17448375 ] 

Harshavardhana edited comment on HADOOP-18019 at 11/24/21, 6:03 AM:
--------------------------------------------------------------------

MinIO has a few ways you can deploy i.e 
 * server mode
 * gateway mode

Within server mode, there is single drive, multi-drive and multi-node multi-drive. 

Now with a single drive, there are limitations and most probably the reason why here it is being misconstrued as not supported. 

 
{code:java}
~ minio server /tmp/xl{0...3} --address ":9001"
Verifying if 2 buckets are consistent across drives...
Automatically configured API requests per node based on available memory on the system: 79
Status:         4 Online, 0 Offline. 
API: http://10.0.0.67:9001  http://172.16.3.3:9001  http://172.17.0.1:9001  http://172.18.0.1:9001  http://127.0.0.1:9001                 
RootUser: minio 
RootPass: minio123 
Console: http://10.0.0.67:37615 http://172.16.3.3:37615 http://172.17.0.1:37615 http://172.18.0.1:37615 http://127.0.0.1:37615         
RootUser: minio 
RootPass: minio123 
{code}
 

 

Create the bucket

 
{code:java}
~ mc mb myminio/test{code}
 

 

 
{code:java}
~ javac -cp hadoop-client-api-3.3.1.jar Test.java && java -cp commons-lang3-3.8.1.jar:commons-logging-1.1.3.jar:hadoop-client-a
pi-3.3.1.jar:hadoop-client-runtime-3.3.1.jar:hadoop-aws-3.3.1.jar:htrace-core4-4.1.0-incubating.jar:slf4j-api-1.7.30.jar:aws-java-sdk-bundle-1.11.901.ja
r:. Test                      
S3AFileStatus{path=s3a://test/testdelta/_delta_log; isDirectory=true; modification_time=0; access_time=0; owner=harsha; group=harsha; permission=rwxrwxrwx; isSymlink=false; hasAcl=false; isEncrypted=true; isErasureCoded=false} isEmptyDirectory=FALSE eTag=null versionId=null
{code}
 

IMO this issue can be marked as closed since this is not an issue instead [~Tagar] ran with a single drive setup. 

Because exactly when you run a single drive setup you shall see this problem.

 

 
{code:java}
~ minio server /mnt/single-drive{code}
 

 

 
{code:java}
~ javac -cp hadoop-client-api-3.3.1.jar Test.java && java -cp commons-lang3-3.8.1.jar:commons-logging-1.1.3.jar:hadoop-client-a
pi-3.3.1.jar:hadoop-client-runtime-3.3.1.jar:hadoop-aws-3.3.1.jar:htrace-core4-4.1.0-incubating.jar:slf4j-api-1.7.30.jar:aws-java-sdk-bundle-1.11.901.ja
r:. Test  
                   Exception in thread "main" java.io.FileNotFoundException: No such file or directory: s3a://test/testdelta/_delta_log
        at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3356)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3053)
        at Test.main(Test.java:17)
{code}
 

 

So yes this is not supported because of underlying limitations on the data format on single drive mode as it tries to preserve POSIX namespace on the backend drives.

This is not true with erasure-coded setups.

The code used in this setup 

 
{code:java}
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class Test {
  public static void main(String[] args) throws Exception
{     Configuration conf = new Configuration();     conf.set("fs.s3a.endpoint", "http://127.0.0.1:9001");     conf.set("fs.s3a.path.style.access", "true");     conf.set("fs.s3a.access.key", "minio");     conf.set("fs.s3a.secret.key", "minio123");     Path path = new Path("s3a://test");     FileSystem fs = path.getFileSystem(conf);     fs.mkdirs(new Path("/testdelta/_delta_log"));     System.out.println(fs.getFileStatus(new Path("/testdelta/_delta_log")));   }
}
{code}


was (Author: y4m4):
MinIO has a few ways you can deploy i.e 
 * server mode
 * gateway mode

Within server mode, there is single drive, multi-drive and multi-node multi-drive. 

Now with a single drive, there are limitations and most probably the reason why here it is being misconstrued as not supported. 

```

~ minio server /tmp/xl\{0...3} --address ":9001"

Verifying if 2 buckets are consistent across drives...
Automatically configured API requests per node based on available memory on the system: 79
Status:         4 Online, 0 Offline. 
API: http://10.0.0.67:9001  http://172.16.3.3:9001  http://172.17.0.1:9001  http://172.18.0.1:9001  http://127.0.0.1:9001                 
RootUser: minio 
RootPass: minio123 

Console: http://10.0.0.67:37615 http://172.16.3.3:37615 http://172.17.0.1:37615 http://172.18.0.1:37615 http://127.0.0.1:37615         
RootUser: minio 
RootPass: minio123 

```

 

Create the bucket

```

~ mc mb myminio/test

```

 

```

~ javac -cp hadoop-client-api-3.3.1.jar Test.java && java -cp commons-lang3-3.8.1.jar:commons-logging-1.1.3.jar:hadoop-client-a
pi-3.3.1.jar:hadoop-client-runtime-3.3.1.jar:hadoop-aws-3.3.1.jar:htrace-core4-4.1.0-incubating.jar:slf4j-api-1.7.30.jar:aws-java-sdk-bundle-1.11.901.ja
r:. Test                      

S3AFileStatus\{path=s3a://test/testdelta/_delta_log; isDirectory=true; modification_time=0; access_time=0; owner=harsha; group=harsha; permission=rwxrwxrwx; isSymlink=false; hasAcl=false; isEncrypted=true; isErasureCoded=false} isEmptyDirectory=FALSE eTag=null versionId=null

```

IMO this issue can be marked as closed since this is not an issue instead [~Tagar] ran with a single drive setup. 

Because exactly when you run as single drive setup you shall see this problem.

```

~ minio server /mnt/single-drive

```

```

~ javac -cp hadoop-client-api-3.3.1.jar Test.java && java -cp commons-lang3-3.8.1.jar:commons-logging-1.1.3.jar:hadoop-client-a
pi-3.3.1.jar:hadoop-client-runtime-3.3.1.jar:hadoop-aws-3.3.1.jar:htrace-core4-4.1.0-incubating.jar:slf4j-api-1.7.30.jar:aws-java-sdk-bundle-1.11.901.ja
r:. Test  

                   Exception in thread "main" java.io.FileNotFoundException: No such file or directory: s3a://test/testdelta/_delta_log
        at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3356)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3053)
        at Test.main(Test.java:17)

```

So yes this is not supported because of underlying limitations on the data format on single drive mode as it tries to preserve POSIX namespace on the backend drives.

This is not true with erasure-coded setups.

The code used in this setup 

```

~ 

~ cat Test.java
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class Test {
  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    conf.set("fs.s3a.endpoint", "http://127.0.0.1:9001");
    conf.set("fs.s3a.path.style.access", "true");
    conf.set("fs.s3a.access.key", "minio");
    conf.set("fs.s3a.secret.key", "minio123");

    Path path = new Path("s3a://test");
    FileSystem fs = path.getFileSystem(conf);

    fs.mkdirs(new Path("/testdelta/_delta_log"));
    System.out.println(fs.getFileStatus(new Path("/testdelta/_delta_log")));
  }
}

```

> S3AFileSystem.s3GetFileStatus() doesn't find dir markers on minio
> -----------------------------------------------------------------
>
>                 Key: HADOOP-18019
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18019
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.3.0, 3.3.1, 3.3.2
>         Environment: minio s3-compatible storage
>            Reporter: Ruslan Dautkhanov
>            Priority: Major
>
> Repro code:
> {code:java}
> val conf = new Configuration()  
> conf.set("fs.s3a.endpoint", "http://127.0.0.1:9000") conf.set("fs.s3a.path.style.access", "true") 
> conf.set("fs.s3a.access.key", "user_access_key") 
> conf.set("fs.s3a.secret.key", "password")  
> val path = new Path("s3a://comcast-test")  
> val fs = path.getFileSystem(conf)  
> fs.mkdirs(new Path("/testdelta/_delta_log"))  
> fs.getFileStatus(new Path("/testdelta/_delta_log")){code}
> Fails with *FileNotFoundException fails* on Minio. The same code works in real S3.
> It also works in Hadoop 3.2 with Minio and earlier versions.
> Only fails on 3.3 and newer Hadoop branches.
> The reason as discovered by [~sadikovi] is actually a more fundamental one - Minio does not have empty directories (sort of), see [https://github.com/minio/minio/issues/2423].
> This works in Hadoop 3.2 because of this infamous "Is this necessary?" block of code
> [https://github.com/apache/hadoop/blob/branch-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2204-L2223]
> that was removed in Hadoop 3.3 -
> [https://github.com/apache/hadoop/blob/branch-3.3.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2179]
> and this causes the regression



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org