You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/07/12 06:48:58 UTC

[GitHub] [iceberg] wang-x-xia opened a new pull request #2807: [2806]Dell EMC ECS catalog implementation

wang-x-xia opened a new pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807


   I create a new module for Dell EMC ECS Catalog implementation.
   
   The package "org.apache.iceberg.dell.emc.ecs" contains following things:
   
   1. Abstacations of object storage:
   
   | Class            | Description                            |
   | ---------------- | -------------------------------------- |
   | ObjectBaseKey    | the prefix of object key.              |
   | ObjectKey        | the object key.                        |
   | ObjectKeys       | the object key operations.             |
   | ObjectHeadInfo   | the basic information of an object.     |
   | EcsClient        | the abstract client of object storage. |
   | PropertiesSerDes | the properties de/serialization.       |
   
   2. ECS catalog implementations
   
   | Impls              | Interface             |
   | ------------------ | --------------------- |
   | EcsCatalog         | Catalog               |
   | EcsFile            | InputFile, OutputFile |
   | EcsFileIO          | FileIO                |
   | EcsTableOperations | TableOperations       |
   
   Then, the package "org.apache.iceberg.dell.emc.ecs.impl" impls the EcsClient and some related interfaces.
   
   Because Dell EMC ECS extends standard Amazon S3 API, we use Amazon S3 SDK v1 (v2 SDK doesn't allow the custom behavior).
   
   | Features                       | Method                                                | Doc                                                                                                                                                                      |
   | ------------------------------ | ----------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
   | replace exist object with eTag | EcsClient#replace                                     | un-document                                                                                                                                                              |
   | create object if absent        | EcsClient#writeIfAbsent, EcsClient#copyObjectIfAbsent | [If-None-Match](http://doc.isilon.com/ECS/3.6/API/S3ObjectOperations_createOrUpdateObject_7916bd6f789d0ae0ff39961c0e660d00_ba672412ac371bb6cf4e69291344510e_detail.html) |
   | append bytes                   | EcsClient#outputStream                                | [Range](http://doc.isilon.com/ECS/3.6/API/S3ObjectOperations_createOrUpdateObject_7916bd6f789d0ae0ff39961c0e660d00_ba672412ac371bb6cf4e69291344510e_detail.html)         |
   
   For unit tests, I create an EcsClient impl named MemoryEcsClient. It provides the same assumptions that EcsClient provided.
   
   Original issue: https://github.com/apache/iceberg/issues/2806


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 closed pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
jackye1995 closed pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wang-x-xia commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
wang-x-xia commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-880543936


   @rdblue  and @kbendick 
   
   Thanks for your suggestion!
   I'll separate this into 3 or 4 parts:
   
   1. EcsClient, which provides object access methods used in Catalog and FileIO.
   2. Implementations related to FileIO.
   3. Implementations related to Catalog.
   
   Maybe the first part will separate to different PR if it contains too many files.
   I need some time to finish this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mechgouki commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
mechgouki commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-931905295


   The Dell EMC ECS SDK already open source at https://github.com/EMCECS/ecs-object-client-java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] melin commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
melin commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-1046040985


   @wang-x-xia  Ecs support hudi?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-959417005






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-952304061


   @wang-x-xia do you plan to also create a new PR for the `EcsCatalog`, or keep updating this one? If you plan to create a new one, I will close this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] fpj commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
fpj commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-880705352


   Out of curiosity, do you use feature branches for large features in Iceberg or you typically prefer to merge the parts directly onto `master`? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-882927287


   @fpj, we prefer merging into master to avoid the need to re-review feature branches to get them into master. I think it works best when we can take a working branch and divide it up into working PRs that can be committed separately.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
openinx commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-911335176


   FYI @jackye1995 & @yyanyy 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-882927287


   @fpj, we prefer merging into master to avoid the need to re-review feature branches to get them into master. I think it works best when we can take a working branch and divide it up into working PRs that can be committed separately.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
kbendick commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-878671554


   One suggestion: You might consider making this a package `dell` similarly to our package `aws`.
   
   I can add a GitHub PR tag for you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-878685881


   @wang-x-xia, thanks for working on this. I'm glad to see proposed support for EMC!
   
   I think the first thing to do is to get this into more manageable chunks to review and commit. Is it possible to divide this into a FileIO implementation PR and then a Catalog and TableOperations PR? It would also be really helpful to add a bit more about what you're proposing to the description. For example: How does the catalog work? What is the atomic operation you're using?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 closed pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
jackye1995 closed pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 closed pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
jackye1995 closed pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wang-x-xia commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
wang-x-xia commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-951846575


   I closed the first PR which create an abstraction of ECS APIs. Due to using our own SDK, this PR is redundant.
   
   And the second PR is now available: https://github.com/apache/iceberg/pull/3376
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wang-x-xia commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
wang-x-xia commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-1047402711


   > @wang-x-xia Ecs support hudi?
   
   Use S3 protocol. Apache Hudi uses the HDFS as its storage abstraction. So it won't use additional benefits from ECS.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-930842682


   had some offline discussion with @mechgouki . Here are some conclusions:
   1. we can make this as the `S3Catalog` implementation in AWS module. In the future even if other vendors including AWS S3 come up with similar semantics, we can add a catalog config to switch across implementations.
   2. we can add a catalog config like `use-append` to switch to the append output stream implementation in `S3FileIO`. Overall the Netflix `S3OutputStream` seems to be still more performant, but the use case around append-optimized object storage looks like a reasonable use case to support.
   3. these all depends on the switch to v1 SDK, but I think it seems to be mutually benefical given that v2 does not support custom header for third-party vendors. Reverting to v1 can provide Iceberg more vendor integration, more features, and reduce the number of modules we need to create for new vendors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
openinx commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-912222584


   Let's make this more clear,  I write the following table:
   
   | Tests | Run in unit tests| Run when release| Public vendor services  | Private vendor services |
   | ------- | ---------------- | -------------- | ---------------------------- | --- |
   |API mock tests | YES | YES |  required | required  |
   |Services simulator | YES | YES |  **optional** | **required**  |
   |Integrated tests by accessing real vendor services| NO | YES ? Private services cannot be checked | required  | required  |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-959417005


   close the PR based on conversations above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wang-x-xia commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
wang-x-xia commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-879740682


   @kbendick 
   
   The package and module name has changed.
   
   @rdblue 
   
   We want to give a whole solution of catalog. It's hard to separate parts of impls into different PRs. I think I can give you more knowledge about this catalog. I'm preparing something about it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mechgouki commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
mechgouki commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-931901428


   > Based on the latest conversation with @mechgouki , Dell has decided to open source their own client SDK under BSD license, and will not go through the S3 SDK. So they will rewrite the PR to contribute their catalog and FileIO.
   
   Yes, this is correct.  And we will take responsibility for the support the client SDK


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] melin commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
melin commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-1048373086


   Hudi supports S3, it should be possible, streaming hudi is easier to write,
   hopefully support HUDi
   
   Xia ***@***.***> 于2022年2月22日周二 12:01写道:
   
   > @wang-x-xia <https://github.com/wang-x-xia> Ecs support hudi?
   >
   > Use S3 protocol. Apache Hudi uses the HDFS as its storage abstraction. So
   > it won't use additional benefits from ECS.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/iceberg/pull/2807#issuecomment-1047402711>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AAIXXZSGFUBQ7LLNJFS46SDU4MDBFANCNFSM5AGGJQIQ>
   > .
   > Triage notifications on the go with GitHub Mobile for iOS
   > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
   > or Android
   > <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
   >
   > You are receiving this because you commented.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-959417005


   close the PR based on conversations above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
openinx commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-912225187


   > I generally agree that we want to be able to run tests that exercise the actual code against a working back-end
   
   @rdblue , your prefer is definitely right if we don't consider the private vendor services.  The Dell ECS cannot be publicly accessed when we release manager decide to check the candidate release,  it will need to deploy their software in their required hardware + hosts to verify the correctness ( free or charge ? @mechgouki ) .  So that's why I think we need a services simulator provided from Dell ECS to align the protocol between iceberg tests and Dell real production services.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-931548031


   Had some offline discussion with @danielcweeks, and we are exploring why SDK V2 could not achieve the goal. It seems that we can still set header through:
   
   ```
       s3.putObject(PutObjectRequest.builder()
           .bucket(bucketName)
           .key(objectKey)
           .overrideConfiguration(AwsRequestOverrideConfiguration.builder()
               .putHeader("If-None-Match", "*")
               .build())
           .build());
   ```
   
   The SDK V1 just has that as a util method in `ObjectMetadata`, all the user metadata are just headers that has prefix `x-amz-meta-` based on https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html.
   
   Could you validate if that is the case? If we can set headers like this, can we move to implement this through the V2 SDK? @mechgouki 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick edited a comment on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
kbendick edited a comment on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-880407046


   > @rdblue
   > 
   > We want to give a whole solution of catalog. It's hard to separate parts of impls into different PRs. I think I can give you more knowledge about this catalog. I'm preparing something about it.
   
   I agree with Ryan that it’s very hard to review PRs that are so large in scope.
   
   Sometimes, I’ve seen people have one main PR / mother PR, kept as a reference (which is updated as other PRs are reviewed). And then smaller PRs of some components (like the ones Ryan mentioned) are broken out for review, with possibly a reference to the whole PR for people to see the desired end picture (marking it as a draft or [DO NOT MERGE] etc).
   
   This way, contributors can review PRs that are more manageable in size, but the overview can still be provided if it really is that important. Just be sure to update the reference / mother PR based on updates you make to the others.
   
   Ideally, parts are well enough contained to be reviewable on their own. But I do agree with Ryan, that if you want to get this in more quickly, it would be most advisable to break it up into more manageable chunks (along the API lines he mentioned would be a good place to start). 🙂
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mechgouki commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
mechgouki commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-883090166


   While we are refining the PR, I would like to hear feedback on the how to do the regression test moving forward.  In this init PR, we will provide an ECS in-memory simulation as test suite. Will this approach work for community?
   The reason here is today ECS mainly focus on on-premise/hybrid cloud use case with different appliance model (with different perf/capacity), so it will be a little hard to run regression test by community alone, that's why we suggest to use the simulation here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] yyanyy commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
yyanyy commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-912112920


   > Before we start to split the PR into smaller PRs, I think we iceberg community need to reach the consistence about the public/private vendor integration contribution. The iceberg-aws module is a great example, it provides independent mock unit tests for the small feature, the most important point is : Adobe has provided the s3 integration test utility : [com.adobe.testing:s3mock-junit4](https://github.com/adobe/S3Mock), it could just launch a local mini s3 cluster for accessing the HTTP API (the S3Mock pretend as a real S3 http server by implementing the S3 API under a local fs directory). The S3Mock simulator have fully covered test cases to guarantee the local S3 has the same semantics as the [aws s3](https://aws.amazon.com/cn/s3/).
   > 
   > When I implement [the aliyun OSS integration](https://github.com/apache/iceberg/pull/2230/files), I thought I should provide a similar object storage simulator to align between the local tests and public aliyun oss, so I provided a [OSSMockApplication](https://github.com/apache/iceberg/pull/2230/files#diff-cae7d6bade136ee5e97da24f979e6352929af6df9d244a3afc3a94770396c1bc) and [TestLocalOSS](https://github.com/apache/iceberg/pull/2230/files#diff-f8329e3691562000032033a485ecc5e30bf6d6a3b7e25e5f8cdd4f4e387b604aR53) to align the semantics. For my personal view, I would prefer to provide a fully tested simulator for private vendor integration so that we could build unit tests on top of it to verify the correctness.
   > 
   > As we will introduce more and more public/private vendor integration in future, I think we should consider agreeing on the details of introducing the vendor as soon as possible, and provide a more complete guide for community contributors to follow and implement.
   > 
   > FYI @rdblue & @danielcweeks .
   
   I think in the ideal world we should, but I'm not sure if we need to completely block new contributions for cloud vendor integration if there is no working backend library for storage services that are available for unit test. In aws module we have an [integration test](https://github.com/apache/iceberg/tree/master/aws/src/integration/java/org/apache/iceberg/aws) package that talks to the actual service. However we don't run them during PR submission and they are run manually before each release. I think we should try to integrate them as one of the auto tests to catch regression. With or without a library that provides full functionality for unit testing, I think this integration test is still valuable. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 closed pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
jackye1995 closed pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wang-x-xia commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
wang-x-xia commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-952485745


   @jackye1995 
   
   No. The code of this PR won't update. I'll create a new PR for catalog implementation.
   I think some discussion on this PR is active. So I didn't close it yesterday.
   Close this PR is fine for me.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mechgouki commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
mechgouki commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-1006465450


   > @wang-x-xia Hello, we are trying to use ecs and iceberg to build lake follow by this Dell doc https://www.delltechnologies.com/asset/zh-cn/products/storage/industry-market/apache-iceberg-dell-emc-ecs.pdf where can we find this jar? iceberg-ecs-catalog-0.12.0.jar thanks.
   
   We was planned to merged this change into Iceberg 0.13 - but due to holiday , it did not happened.  So please send mail to dell emc channel to get the official support. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mechgouki commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
mechgouki commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-912267428


   Thanks @rdblue and @openinx  for the feedback.
   Yes, today we (Dell EMC Object storage) mainly runs as private service, so even we do have the process for customer to try but that could be an over-kill for community. So I would like to suggest that we provide a new S3 mock service( which will base on Adobe one and focus on special extension APIs since we do have good compatibility with AWS S3)
   
    For the real integration with our customers, we will take the responsibility, instead of community.   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-911844201


   I generally agree that we want to be able to run tests that exercise the actual code against a working back-end, and not tests that use custom mocking at some level within the code being tested.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] figurant commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
figurant commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-1006268406


   @wang-x-xia Hello, we are trying to use ecs and iceberg to build lake follow by this Dell doc https://www.delltechnologies.com/asset/zh-cn/products/storage/industry-market/apache-iceberg-dell-emc-ecs.pdf
   where can we find this jar? iceberg-ecs-catalog-0.12.0.jar
   thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-931887755


   Based on the latest conversation with @mechgouki , Dell has decided to open source their own client SDK under BSD license, and will not go through the S3 SDK. So they will rewrite the PR to contribute their catalog and FileIO.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mechgouki commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
mechgouki commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-883090166


   While we are refining the PR, I would like to hear feedback on the how to do the regression test moving forward.  In this init PR, we will provide an ECS in-memory simulation as test suite. Will this approach work for community?
   The reason here is today ECS mainly focus on on-premise/hybrid cloud use case with different appliance model (with different perf/capacity), so it will be a little hard to run regression test by community alone, that's why we suggest to use the simulation here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-880251210


   @wang-x-xia, it's really hard to review PRs that are larger than necessary. If you want to get this in more quickly, I suggest making it easier to review by dividing it up into reasonable sized PRs. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick edited a comment on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
kbendick edited a comment on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-878671554


   One suggestion: You might consider making this a package `dell` similarly to our package `aws`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-882927287


   @fpj, we prefer merging into master to avoid the need to re-review feature branches to get them into master. I think it works best when we can take a working branch and divide it up into working PRs that can be committed separately.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mechgouki commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
mechgouki commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-953485650


   @jackye1995 @openinx  Just record the offline discussion about the integration test:
   
   - We will first try to merge the implementation and client mock test suites which you guys already stared the review process
   - And we will take responsibility for our customers to run iceberg on our products , not community.
   -  We are in parallel developing the full integration mock service - but that need time to pass the our side review first. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mechgouki commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
mechgouki commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-930797701


   > A few points I'd like to discuss:
   > 
   > 1. around private vendor of catalog implementation
   > 
   > I remember @rdblue you talked about the possibility of having a RESTful Catalog implementation to plugin, would that help this Dell use case?
   > 
   > 1. around S3 SDK version
   > 
   > I have been thinking recently a lot about the SDK version, and maybe we could consider reverting to v1, and Dell can contribute just a `S3Catalog` instead.
   > 
   > The reason I am thinking about reverting to v1 is because of client side encryption support. V2 was promised to offer client side encryption this summer which would let v2 SDK have full functionality compatibility with v1 plus supposedly better performance, but the whole project was significantly delayed and won't be done until years later. So there is also an ask for adding S3 client side encryption from user side, for which the only way to achieve that is through reverting to v1.
   > 
   > I think this version change could be done given the fact that nothing around AWS client is publicly exposed. Some work is needed to update documentation around the dependency jars to add. But if we see enough benefits in adopting S3-like private vendors by reintroducing V1, I think this seems to be the best way to go.
   > 
   > @danielcweeks what do you think about the S3 SDK situation?
   > 
   > @mechgouki if we reintroduce v1 SDK, do you think you still need the dell module, or could you just implement a `S3Catalog` instead in AWS module?
   
   @jackye1995 
   
   Basically we have 2 areas which Dell EMC features could benefit as below: 
   (1)Append operation in additional of MPU, if the client has less local cache for large object( like edge), Dell EMC object service could help here.
   (2)Atomic rename. We have If-Match and If-None-Match semantic as we support strong consistency model (within one site) from the very beginning.
   
   So in order to support these 2 changes, we need change both in FileIO and S3Catalog, which we can not directly use AWS module while we could based on V1 SDK to extend.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mechgouki commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
mechgouki commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-912269882


   We are also moving to cloud native approach, so maybe in the future we could deploy on cloud and run the integration test there.
   
   But right now we would like to explore a new testing strategy with community and get ball rolling 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
kbendick commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-880407046


   > @rdblue
   > 
   > We want to give a whole solution of catalog. It's hard to separate parts of impls into different PRs. I think I can give you more knowledge about this catalog. I'm preparing something about it.
   
   I agree with Ryan that it’s very hard to review PRs that are so large in scope.
   
   Sometimes, I’ve seen people have one main PR / mother PR, kept as a reference (which is updated as other PRs are reviewed). And then smaller PRs of some components (like the ones Ryan mentioned) are broken out for review, with possibly a reference to the whole PR for people to see the desired end picture.
   
   This way, contributors can review PRs that are more manageable in size, but the overview can still be provided if it really is that important. Just be sure to update the reference / mother PR based on updates you make to the others.
   
   Ideally, parts are well enough contained to be reviewable on their own. But I do agree with Ryan, that if you want to get this in more quickly, it would be most advisable to break it up into more manageable chunks (along the API lines he mentioned would be a good place to start). 🙂
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick edited a comment on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
kbendick edited a comment on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-878671554


   One suggestion: You might consider making the top level folder `dell`, similarly to our package `aws`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-930787419


   A few points I'd like to discuss:
   
   1. around private vendor of catalog implementation
   
   I remember @rdblue you talked about the possibility of having a RESTful Catalog implementation to plugin, would that help this Dell use case?
   
   2. around S3 SDK version
   
   I have been thinking recently a lot about the SDK version, and maybe we could consider reverting to v1, and Dell can contribute just a `S3Catalog` instead. 
   
   The reason I am thinking about reverting to v1 is because of client side encryption support. V2 was promised to offer client side encryption this summer which would let v2 SDK have full functionality compatibility with v1 plus supposedly better performance, but the whole project was significantly delayed and won't be done until years later. So there is also an ask for adding S3 client side encryption from user side, for which the only way to achieve that is through reverting to v1. 
   
   I think this version change could be done given the fact that nothing around AWS client is publicly exposed. Some work is needed to update documentation around the dependency jars to add. But if we see enough benefits in adopting S3-like private vendors by reintroducing V1, I think this seems to be the best way to go. 
   
   @danielcweeks what do you think about the S3 SDK situation?
   
   @mechgouki if we reintroduce v1 SDK, do you think you still need the dell module, or could you just implement a `S3Catalog` instead in AWS module?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mechgouki commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
mechgouki commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-883090166


   While we are refining the PR, I would like to hear feedback on the how to do the regression test moving forward.  In this init PR, we will provide an ECS in-memory simulation as test suite. Will this approach work for community?
   The reason here is today ECS mainly focus on on-premise/hybrid cloud use case with different appliance model (with different perf/capacity), so it will be a little hard to run regression test by community alone, that's why we suggest to use the simulation here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on pull request #2807: [2806]Dell EMC ECS catalog implementation

Posted by GitBox <gi...@apache.org>.
openinx commented on pull request #2807:
URL: https://github.com/apache/iceberg/pull/2807#issuecomment-911334551


   Before we start to split the PR into smaller PRs, I think we iceberg community need to reach the consistence about the public/private vendor integration contribution.  The iceberg-aws module is a great example, it provides independent mock unit tests for the small feature,  the most important point is :  Adobe has provided the s3 integration test utility : [com.adobe.testing:s3mock-junit4](https://github.com/adobe/S3Mock),   it could just launch a local mini s3 cluster for accessing the HTTP API (the S3Mock pretend as a real S3 http server by implementing the S3 API under a local fs directory).  The S3Mock simulator have fully covered test cases to guarantee the local S3 has the same semantics as the [aws s3](https://aws.amazon.com/cn/s3/). 
   
   When I implement [the aliyun OSS integration](https://github.com/apache/iceberg/pull/2230/files),  I thought I should provide a  similar object storage simulator to align between the local tests and public aliyun oss,  so I provided a [OSSMockApplication](https://github.com/apache/iceberg/pull/2230/files#diff-cae7d6bade136ee5e97da24f979e6352929af6df9d244a3afc3a94770396c1bc) and [TestLocalOSS](https://github.com/apache/iceberg/pull/2230/files#diff-f8329e3691562000032033a485ecc5e30bf6d6a3b7e25e5f8cdd4f4e387b604aR53) to align the semantics.  For my personal view,  I would prefer to provide a fully tested simulator for private vendor integration so that we could build unit tests on top of it to verify the correctness.
   
   As we will introduce more and more public/private vendor integration in future,  I think we should consider agreeing on the details of introducing the vendor as soon as possible, and provide a more complete guide for community contributors to follow and implement.
   
   FYI @rdblue & @danielcweeks . 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org