You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2020/03/31 10:43:05 UTC

[GitHub] [hadoop] Ocean-Lua opened a new pull request #1928: HADOOP-16950.Extend Hadoop S3a access from single endpoint to multipl…

Ocean-Lua opened a new pull request #1928: HADOOP-16950.Extend Hadoop S3a access from single endpoint to multipl…
URL: https://github.com/apache/hadoop/pull/1928
 
 
   Addresses the following issue:
   https://issues.apache.org/jira/browse/HADOOP-16950
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] steveloughran commented on issue #1928: HADOOP-16950.Extend Hadoop S3a access from single endpoint to multipl…

Posted by GitBox <gi...@apache.org>.
steveloughran commented on issue #1928: HADOOP-16950.Extend Hadoop S3a access from single endpoint to multipl…
URL: https://github.com/apache/hadoop/pull/1928#issuecomment-607217312
 
 
   
   First, this is a pretty wild idea. Make sense there for things like ceph which tries to offer HA without any load balancer games: the client needs to be clever.
   
   Second: how I review https://github.com/steveloughran/formality/blob/master/styleguide/styleguide.md
   
   We also have a strict "no declared endpoint, no review" policy, as covered in the s3a testing doc. If you are doing something for ceph, I'm going to have to request that you also run against AWS S3 for regression testing.
   
   We also like our s3a tests to have integration tests, with ITest prefix, sorry can verify things actually work with real S3A implementations. This is going to be pretty critical here. At the very least, a list of the same endpoint, repeated, should work for operations against the store. 
   
   
   And, and this might disappoint you, all PRs have to be against hadoop-trunk, currently v 3.4.0. 
   
   
   Regarding the code itself: 
   
   ## cglib & dynamic code generation vs new AmazonS3 implementation,
   
   I really do not want cglib to be used here. 
   * It + ASM have been brittle in the past across JVM byte code versions; it is a recurrent issue (https://github.com/GoogleContainerTools/jib/issues/2015)
   * it's adding complexity and maintenance grief
   
   Can't we just wrap the Amazon S3 client instead? We do that for the inconsistent client already, after all.
   
   the risk there is: new methods in later SDKs need to be tracked, but as its a wrap/relay to AmazonS3, were any new methods to be added the wrapper class would fail to compile "Unimplemented Method", we will find out fast. The runbook for an SDK update will need to cover this extra work, but that's all.
   
   ## Tests
   
   yes, integration tests are mandatory. Especially those exploring failure conditions. Unit tests are nice but not a substitute.
   
   ## Documentation
   
   That too. Especially extensions to the troubleshooting docs. The easiest way to add new docs would be a whole new markdown file in the site/ dir, with a reference off index.md.
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] steveloughran removed a comment on issue #1928: HADOOP-16950.Extend Hadoop S3a access from single endpoint to multipl…

Posted by GitBox <gi...@apache.org>.
steveloughran removed a comment on issue #1928: HADOOP-16950.Extend Hadoop S3a access from single endpoint to multipl…
URL: https://github.com/apache/hadoop/pull/1928#issuecomment-607217312
 
 
   
   First, this is a pretty wild idea. Make sense there for things like ceph which tries to offer HA without any load balancer games: the client needs to be clever.
   
   Second: how I review https://github.com/steveloughran/formality/blob/master/styleguide/styleguide.md
   
   We also have a strict "no declared endpoint, no review" policy, as covered in the s3a testing doc. If you are doing something for ceph, I'm going to have to request that you also run against AWS S3 for regression testing.
   
   We also like our s3a tests to have integration tests, with ITest prefix, sorry can verify things actually work with real S3A implementations. This is going to be pretty critical here. At the very least, a list of the same endpoint, repeated, should work for operations against the store. 
   
   
   And, and this might disappoint you, all PRs have to be against hadoop-trunk, currently v 3.4.0. 
   
   
   Regarding the code itself: 
   
   ## cglib & dynamic code generation vs new AmazonS3 implementation,
   
   I really do not want cglib to be used here. 
   * It + ASM have been brittle in the past across JVM byte code versions; it is a recurrent issue (https://github.com/GoogleContainerTools/jib/issues/2015)
   * it's adding complexity and maintenance grief
   
   Can't we just wrap the Amazon S3 client instead? We do that for the inconsistent client already, after all.
   
   the risk there is: new methods in later SDKs need to be tracked, but as its a wrap/relay to AmazonS3, were any new methods to be added the wrapper class would fail to compile "Unimplemented Method", we will find out fast. The runbook for an SDK update will need to cover this extra work, but that's all.
   
   ## Tests
   
   yes, integration tests are mandatory. Especially those exploring failure conditions. Unit tests are nice but not a substitute.
   
   ## Documentation
   
   That too. Especially extensions to the troubleshooting docs. The easiest way to add new docs would be a whole new markdown file in the site/ dir, with a reference off index.md.
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org