You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2021/02/11 16:14:16 UTC

[GitHub] [hadoop] billierinaldi commented on a change in pull request #1925: HADOOP-16948. Support single writer dirs.

billierinaldi commented on a change in pull request #1925:
URL: https://github.com/apache/hadoop/pull/1925#discussion_r574631046



##########
File path: hadoop-tools/hadoop-azure/src/site/markdown/abfs.md
##########
@@ -877,6 +877,21 @@ enabled for your Azure Storage account."
 The directories can be specified as comma separated values. By default the value
 is "/hbase"
 
+### <a name="singlewriteroptions"></a> Single Writer Options
+`fs.azure.singlewriter.directories`: Directories for single writer support
+can be specified comma separated in this config. By default, multiple
+clients will be able to write to the same file simultaneously. When writing
+to files contained within the directories specified in this config, the
+client will obtain a lease on the file that will prevent any other clients
+from writing to the file. The lease will be renewed by the client until the
+output stream is closed, after which it will be released. To revoke a client's
+write access for a file, the AzureBlobFilesystem breakLease method may be
+ called.
+
+`fs.azure.lease.threads`: This is the size of the thread pool that will be
+used for lease operations for single writer directories. By default the value
+is 0, so it must be set to at least 1 to support single writer directories.

Review comment:
       > is there any validation here, that if a path in the local FS is to be leased then the executor count must be >1?
   
   Yes, in SelfRenewingLease it throws an exception if there are < 1 lease threads.
   
   > what if I'm working with >1 FS? Will this configuration be per-fs? Or does it take a list of paths which can be full URIs to paths in a store?
   
   I believe the single writer dirs config accepts a list of full URIs -- I will double check -- but they all share the same pool of lease threads.
   
   I am also looking into whether it makes sense to make the lease duration configurable. This would allow configuration of a finite or infinite lease duration, and in the infinite lease case we could avoid frequent calls to the Azure API to renew the lease. (For an infinite lease, if the client stopped without releasing the lease, the lease would have to be explicitly broken for a different writer to obtain a new lease on the file. It's a tradeoff, and I could imagine both finite and infinite lease options being useful.)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org