You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "VisarBuza (via GitHub)" <gi...@apache.org> on 2023/07/26 08:28:07 UTC

[GitHub] [pinot] VisarBuza opened a new issue, #11182: Issue with S3PinotFs URL Encoding Interfering with Ceph S3 Compatibility

VisarBuza opened a new issue, #11182:
URL: https://github.com/apache/pinot/issues/11182

   We are using Apache Pinot with S3 as our deep store, specifically, Ceph as an S3 compatible storage. We are encountering an issue during the segment finalization process when a copy request is issued to S3, which consistently fails with a 400 error.
   
   Upon further inspection, it appears that this error originates from how S3PinotFs URL encodes the source key before passing it to the library. The encoding converts slashes ("/") into "%2F". This is problematic because Ceph's S3 integration doesn't interpret URL encoded slashes as delimiters, but rather as part of the key name. Ceph's S3 integration throws a failed to parse x-amx-copy-source header when receiving the request.
   
   Here is the relevant piece of the code from S3PinotFS.java:
   
   ```java
   private boolean copyFile(URI srcUri, URI dstUri) throws IOException {
     try {
       String encodedUrl = null;
       try {
         encodedUrl = URLEncoder.encode(srcUri.getHost() + srcUri.getPath(), StandardCharsets.UTF_8.toString());
       } catch (UnsupportedEncodingException e) {
         throw new RuntimeException(e);
       }
   
       String dstPath = sanitizePath(dstUri.getPath());
       CopyObjectRequest copyReq = generateCopyObjectRequest(encodedUrl, dstUri, dstPath, null);
       CopyObjectResponse copyObjectResponse = _s3Client.copyObject(copyReq);
       return copyObjectResponse.sdkHttpResponse().isSuccessful();
     } catch (S3Exception e) {
       throw new IOException(e);
     }
   }
   
   ```
   
   From our understanding, the Amazon S3 SDK internally URL encodes the source key if necessary (e.g., for special characters) but does not encode slashes, as they are used as delimiters. Furthermore, according to the [official AWS documentation](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html), source key delimiters should not be URL encoded beforehand when using the S3 SDK; this only applies to direct HTTP requests.
   
   Also after inspecting the AWS S3 SDK source code, we can clearly see that the SDK is Url encoding the source key [code snippet](https://github.com/aws/aws-sdk-java/blob/8cade0bc9cf72071bae24fd4d85c3ae182b7ed0f/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L4619).
   
   In addition to the problem stated above, it's worth noting that Pinot's current approach uses the copySource method, which has been deprecated by the AWS S3 SDK. The SDK now favors the sourceBucket and sourceKey methods.
   
   Should Pinot decide to transition to these new methods, the current issue might become even more problematic and extend beyond Ceph to other cloud providers as well. This is because the S3 SDK, in the new methods, automatically inserts a slash ("/") between the bucket name and the object key. This will result in a malformed x-amz-copy-source header of the form bucket-name/objectPrefix%2FobjectPostFix, where "%2F" represents a slash.
   
   Therefore, it is crucial to address the URL encoding issue to not only improve compatibility with Ceph and other S3-compatible storages but also to ensure a smoother transition to the updated S3 SDK methods.
   
   ```java
     private CopyObjectRequest generateCopyObjectRequest(String copySource, URI dest, String path,
         Map<String, String> metadata) {
       CopyObjectRequest.Builder copyReqBuilder =
           CopyObjectRequest.builder().copySource(copySource).destinationBucket(dest.getHost()).destinationKey(path);
       if (metadata != null) {
         copyReqBuilder.metadata(metadata).metadataDirective(MetadataDirective.REPLACE);
       }
       if (!_disableAcl) {
         copyReqBuilder.acl(ObjectCannedACL.BUCKET_OWNER_FULL_CONTROL);
       }
       if (_serverSideEncryption != null) {
         copyReqBuilder.serverSideEncryption(_serverSideEncryption).ssekmsKeyId(_ssekmsKeyId);
         if (_ssekmsEncryptionContext != null) {
           copyReqBuilder.ssekmsEncryptionContext(_ssekmsEncryptionContext);
         }
       }
       return copyReqBuilder.build();
     }
   ```
   
   Given this, we'd like to inquire:
   
   Is there a particular reason for the current URL encoding behavior?
   Could the URL encoding step be omitted, leaving the encoding to the S3 SDK?
   We believe that resolving this issue would improve compatibility with S3-compatible storage solutions like Ceph. We are also open to contributing to the resolution of this issue should the need arise.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #11182: Issue with S3PinotFs URL Encoding Interfering with Ceph S3 Compatibility

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #11182:
URL: https://github.com/apache/pinot/issues/11182#issuecomment-1652275677

   Thanks for the research and detailed explanation. @snleee @swaminathanmanish Can you help take a look at this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org