You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Ocean Lua (Jira)" <ji...@apache.org> on 2020/03/31 08:56:00 UTC
[jira] [Updated] (HADOOP-16950) Extend Hadoop S3a access from single endpoint to multiple endpoints

     [ https://issues.apache.org/jira/browse/HADOOP-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ocean Lua updated HADOOP-16950:
-------------------------------
    Description: 
The client API of Hadoop aws can only support a single endpoint to access. However, there are multiple endpoints in object storage (such as ceph), and therefore the storage resources could not be fully used. To address the issue, we create a new Implementation of S3AFileSystem, which support multi-endpoint access. After the optimization, system performance will increase significantly.

Usage:
 1.Ensure hadoop-aws API availiable.
 2.Copy hadoop-aws-3.1.3.jar and aws-java-sdk-bundle-1.11.271.jar to directory share/hadoop/common/lib in hadoop (hadoop-aws-3.1.3.jar and aws-java-sdk-bundle-1.11.271.jar are normally located at directory share/hadoop/tools/lib).
 3.In file etc/hadoop/hadoop-env.sh, add the following:
 export HADOOP_CLASSPATH=/(hadoop root directory)/share/hadoop/common/lib/hadoop-aws-3.1.1.jar:/(hadoop root directory)/share/hadoop/common/lib/hadoop-aws-3.1.3.jar:$HADOOP_CLASSPATH
 4.Edit configuration file "core-site.xml" and set properties below:
 <property>
 <name>fs.s3a.s3.client.factory.impl</name>
 <value>org.apache.hadoop.fs.s3a.MultiAddrS3ClientFactory</value>
 </property>
 <property>
 <name>fs.s3a.endpoint</name>
 <value>[http://addr1:port1,http://addr2:port2|http://addr1:port1%2Chttp//addr2:port2],...</value>
 </property>
 5.Optional configuration in "core-site.xml":
 <property>
 <name>fs.s3a.S3ClientSelector.class</name>
 <value>org.apache.hadoop.fs.s3a.RandomS3ClientSelector</value>
 </property>
 This configuration is used to set the s3a service selection policy. The default value is org.apache.hadoop.fs.s3a.RandomS3ClientSelector, which is a completely random selector. The configuration can be set to org.apache.hadoop.fs.s3a.PathS3ClientSelector, which is a selector according to the file path.

  was:
The client API of Hadoop aws can only support a single endpoint to access. However, there are multiple endpoints in object storage (such as ceph), and therefore the storage resources could not be fully used. To address the issue, we create a new Implementation of S3AFileSystem, which support multi-endpoint access. After the optimization, system performance will increase significantly.
	
Usage:
1.Ensure hadoop-aws API availiable.
2.Copy hadoop-aws-3.1.1.jar and aws-java-sdk-bundle-1.11.271.jar to directory share/hadoop/common/lib in hadoop (hadoop-aws-3.1.1.jar and aws-java-sdk-bundle-1.11.271.jar are normally located at directory share/hadoop/tools/lib).
3.In file etc/hadoop/hadoop-env.sh, add the following:
export HADOOP_CLASSPATH=/(hadoop root directory)/share/hadoop/common/lib/hadoop-aws-3.1.1.jar:/(hadoop root directory)/share/hadoop/common/lib/hadoop-aws-3.1.3.jar:$HADOOP_CLASSPATH
4.Edit configuration file "core-site.xml" and set properties below:
  <property>
    <name>fs.s3a.s3.client.factory.impl</name>
    <value>org.apache.hadoop.fs.s3a.MultiAddrS3ClientFactory</value>
  </property>
  <property>
	<name>fs.s3a.endpoint</name>
	<value>http://addr1:port1,http://addr2:port2,...</value>
  </property>
5.Optional configuration in "core-site.xml":
    <property>
		<name>fs.s3a.S3ClientSelector.class</name>
		<value>org.apache.hadoop.fs.s3a.RandomS3ClientSelector</value>
	</property>
	This configuration is used to set the s3a service selection policy. The default value is org.apache.hadoop.fs.s3a.RandomS3ClientSelector, which is a completely random selector. The configuration can be set to  org.apache.hadoop.fs.s3a.PathS3ClientSelector, which is a selector according to the file path.


> Extend Hadoop S3a access from single endpoint to multiple endpoints
> -------------------------------------------------------------------
>
>                 Key: HADOOP-16950
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16950
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>    Affects Versions: 3.1.3
>            Reporter: Ocean Lua
>            Priority: Major
>              Labels: Endpoint, ceph
>
> The client API of Hadoop aws can only support a single endpoint to access. However, there are multiple endpoints in object storage (such as ceph), and therefore the storage resources could not be fully used. To address the issue, we create a new Implementation of S3AFileSystem, which support multi-endpoint access. After the optimization, system performance will increase significantly.
> Usage:
>  1.Ensure hadoop-aws API availiable.
>  2.Copy hadoop-aws-3.1.3.jar and aws-java-sdk-bundle-1.11.271.jar to directory share/hadoop/common/lib in hadoop (hadoop-aws-3.1.3.jar and aws-java-sdk-bundle-1.11.271.jar are normally located at directory share/hadoop/tools/lib).
>  3.In file etc/hadoop/hadoop-env.sh, add the following:
>  export HADOOP_CLASSPATH=/(hadoop root directory)/share/hadoop/common/lib/hadoop-aws-3.1.1.jar:/(hadoop root directory)/share/hadoop/common/lib/hadoop-aws-3.1.3.jar:$HADOOP_CLASSPATH
>  4.Edit configuration file "core-site.xml" and set properties below:
>  <property>
>  <name>fs.s3a.s3.client.factory.impl</name>
>  <value>org.apache.hadoop.fs.s3a.MultiAddrS3ClientFactory</value>
>  </property>
>  <property>
>  <name>fs.s3a.endpoint</name>
>  <value>[http://addr1:port1,http://addr2:port2|http://addr1:port1%2Chttp//addr2:port2],...</value>
>  </property>
>  5.Optional configuration in "core-site.xml":
>  <property>
>  <name>fs.s3a.S3ClientSelector.class</name>
>  <value>org.apache.hadoop.fs.s3a.RandomS3ClientSelector</value>
>  </property>
>  This configuration is used to set the s3a service selection policy. The default value is org.apache.hadoop.fs.s3a.RandomS3ClientSelector, which is a completely random selector. The configuration can be set to org.apache.hadoop.fs.s3a.PathS3ClientSelector, which is a selector according to the file path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org