You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2021/11/11 12:56:00 UTC

[jira] [Created] (HADOOP-18004) abfs and s3a disk buffer factories to use UUIDs for file prefixes

Steve Loughran created HADOOP-18004:
---------------------------------------

             Summary: abfs and s3a disk buffer factories to use UUIDs for file prefixes
                 Key: HADOOP-18004
                 URL: https://issues.apache.org/jira/browse/HADOOP-18004
             Project: Hadoop Common
          Issue Type: Improvement
          Components: fs/azure, fs/s3
    Affects Versions: 3.3.2
            Reporter: Steve Loughran
            Assignee: Mehakmeet Singh


the disk buffers created in s3a and abfs output streams use a simple String.format("datablock-%04d-",  index) pattern for the prefix for File.tmpFile

this means there will be contention for filenames across streams, especially across processes. 

if each stream had a uuid prefix there'd be no contention. That'd change the API though. Alternatively: each disk block factory has the uuid, and the index is simply total number blocks created. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org