You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/07/21 14:02:17 UTC

[GitHub] [iceberg] wang-x-xia opened a new pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

wang-x-xia opened a new pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847


   It's the first part of https://github.com/apache/iceberg/pull/2807 .
   
   Dell EMC ECS is an Object Storage which delivers rich S3-compatibility.
   
   This PR is to add new interfaces that provide several features from the Dell EMC ECS.
   
   The main interface is `EcsClient`, it provides methods like:
   
   1. To get object info.
   2. To read and write object with Java InputStream/OutputStream that for data files.
   3. To read, replace, write and copy object that for metadata files.
   4. To list objects that for fetch concepts such as namespaces and databases.
   
   And there are 2 additional interfaces which provide helper method of client:
   
   1. `PropertiesSerDes` is used to convert `Map<String, String>` to file content. I add a default version value to avoid compatible problems in future.
   2. `ObjectKeys` is used to convert name of Iceberg resources to object key.
   
   The `EcsClientImpl` provide the impl that use Amazon AWS S3 Java SDK (V1) to connect with the ECS.
   The Amazon AWS S3 Java SDK (V2), used in the `:iceberg-aws` module, doesn't provide extension.
   
   The `MemoryEcsClient` provide the impl that mock all operations of `EcsClient` in memory. I will use in future tests.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#issuecomment-916992895


   @mechgouki, that sounds like reasonable motivation to go with the EMC client to me. What are the dependencies of the EMC artifact?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mechgouki commented on pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
mechgouki commented on pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#issuecomment-912278449


   > @wang-x-xia, if EMC is S3 compatible and uses the v1 S3 API, why not make this an implementation of FileIO that uses the older S3 API? Can you help me understand why this is specific to EMC?
   
   The short answer is the V2 SDK added limitation for extension, while some extension APIs will benefit Iceberg to have better performance run on our product.
   As Xia said, we could switch to use DellEMC Client which also open source https://github.com/EMCECS/ecs-object-client-java  
   But we might need additional check on legal as the license is BSD.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mechgouki edited a comment on pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
mechgouki edited a comment on pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#issuecomment-916531117


   > @wang-x-xia, the 3-clause BSD license is compatible with the Apache License: https://www.apache.org/legal/resolved.html
   > 
   > It is listed as Category A. So it should be fine.
   > 
   > What is the rationale for using a library specific to EMC rather than the S3 v1 API? It seems like the S3 v1 API is probably going to be useful to more people. Are there EMC features that we can only get using the EMC client? You mentioned append support, but all of the formats that Iceberg uses are immutable. So we never need to re-open a file and append more data.
   
   
   Basically we have 2 areas which Dell EMC features could benefit :
   (1)Append operation in additional of MPU, if the client has less local cache for large object( like edge), Dell EMC object service could help here.
   (2)Atomic rename. We have If-Match and If-None-Match semantic as we support strong consistency model (within one site) from the very beginning.
   
   And talked internally, for better support the customer, if we use Dell EMC features, we'd better use our own client with BSD License. AWS S3 V1 could be workaround here, but V2 can not be an option .
   
   Looking forward for your suggestion here


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wang-x-xia closed pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
wang-x-xia closed pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mechgouki removed a comment on pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
mechgouki removed a comment on pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#issuecomment-916530749


   > @wang-x-xia, the 3-clause BSD license is compatible with the Apache License: https://www.apache.org/legal/resolved.html
   > 
   > It is listed as Category A. So it should be fine.
   > 
   > What is the rationale for using a library specific to EMC rather than the S3 v1 API? It seems like the S3 v1 API is probably going to be useful to more people. Are there EMC features that we can only get using the EMC client? You mentioned append support, but all of the formats that Iceberg uses are immutable. So we never need to re-open a file and append more data.
   
   
   Basically we have 2 areas which Dell EMC features could benefit :
   (1)Append operation in additional of MPU, if the client has less local cache for large object( like edge), Dell EMC object service could help here.
   (2)Atomic rename.  We have If-Match and If-None-Match semantic as we support strong consistency model (within one site) from the very beginning.
   
   And talked internally, for better support the customer, if we use Dell EMC  features, we'd better use our own client with BSD License.  AWS S3 V1 could be workaround here, but V2 can not be a option .
   
   Looking forward for your suggestion here
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#issuecomment-916513588


   @wang-x-xia, the 3-clause BSD license is compatible with the Apache License: https://www.apache.org/legal/resolved.html
   
   It is listed as Category A. So it should be fine.
   
   What is the rationale for using a library specific to EMC rather than the S3 v1 API? It seems like the S3 v1 API is probably going to be useful to more people. Are there EMC features that we can only get using the EMC client? You mentioned append support, but all of the formats that Iceberg uses are immutable. So we never need to re-open a file and append more data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mechgouki commented on pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
mechgouki commented on pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#issuecomment-917994009


   > @mechgouki, that sounds like reasonable motivation to go with the EMC client to me. What are the dependencies of the EMC artifact?
   
   @rdblue 
   We listed all the dependencies at https://github.com/EMCECS/ecs-object-client-java/tree/master/3rd-party-licenses
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#discussion_r701232798



##########
File path: build.gradle
##########
@@ -406,6 +406,7 @@ project(':iceberg-flink-runtime') {
     compile(project(':iceberg-nessie')) {
       exclude group: 'com.google.code.findbugs', module: 'jsr305'
     }
+    compile(project(':iceberg-dell'))

Review comment:
       Please remove additions to the runtime Jars until later. We add new integrations to the runtime Jars after they are finished, and we also look at the dependencies that are added to the runtime Jar as a result. We need to be careful with runtime Jar licensing.
   
   I don't think that we want to start worrying about updating LICENSE files and looking at sizes just yet.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wang-x-xia commented on a change in pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
wang-x-xia commented on a change in pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#discussion_r702439437



##########
File path: dell/src/main/java/org/apache/iceberg/dell/ObjectKey.java
##########
@@ -0,0 +1,73 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.dell;
+
+import java.util.Objects;
+
+/**
+ * a immutable record class of object key

Review comment:
       Fixed!

##########
File path: dell/src/main/java/org/apache/iceberg/dell/ObjectKey.java
##########
@@ -0,0 +1,73 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.dell;
+
+import java.util.Objects;
+
+/**
+ * a immutable record class of object key
+ */
+public class ObjectKey {
+  /**
+   * bucket

Review comment:
       Removed!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#discussion_r701235870



##########
File path: dell/src/main/java/org/apache/iceberg/dell/ObjectKey.java
##########
@@ -0,0 +1,73 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.dell;
+
+import java.util.Objects;
+
+/**
+ * a immutable record class of object key
+ */
+public class ObjectKey {
+  /**
+   * bucket

Review comment:
       Please omit useless comments. Repeating the field name is not helpful.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#discussion_r701237486



##########
File path: dell/src/main/java/org/apache/iceberg/dell/ObjectKey.java
##########
@@ -0,0 +1,73 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.dell;
+
+import java.util.Objects;
+
+/**
+ * a immutable record class of object key
+ */
+public class ObjectKey {
+  /**
+   * bucket
+   */
+  private final String bucket;
+
+  /**
+   * key
+   */
+  private final String key;
+
+  public ObjectKey(String bucket, String key) {
+    if (bucket == null || key == null) {
+      throw new IllegalArgumentException(String.format("bucket %s and key %s must be not null", bucket, key));
+    }
+    this.bucket = bucket;
+    this.key = key;
+  }
+
+  public String getBucket() {
+    return bucket;
+  }
+
+  public String getKey() {
+    return key;
+  }
+
+  @Override
+  public String toString() {

Review comment:
       Does this not form a location, like `emc://bucket/path`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mechgouki commented on pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
mechgouki commented on pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#issuecomment-916531117


   > @wang-x-xia, the 3-clause BSD license is compatible with the Apache License: https://www.apache.org/legal/resolved.html
   > 
   > It is listed as Category A. So it should be fine.
   > 
   > What is the rationale for using a library specific to EMC rather than the S3 v1 API? It seems like the S3 v1 API is probably going to be useful to more people. Are there EMC features that we can only get using the EMC client? You mentioned append support, but all of the formats that Iceberg uses are immutable. So we never need to re-open a file and append more data.
   
   
   Basically we have 2 areas which Dell EMC features could benefit :
   (1)Append operation in additional of MPU, if the client has less local cache for large object( like edge), Dell EMC object service could help here.
   (2)Atomic rename. We have If-Match and If-None-Match semantic as we support strong consistency model (within one site) from the very beginning.
   
   And talked internally, for better support the customer, if we use Dell EMC features, we'd better use our own client with BSD License. AWS S3 V1 could be workaround here, but V2 can not be a option .
   
   Looking forward for your suggestion here


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mechgouki edited a comment on pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
mechgouki edited a comment on pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#issuecomment-898145724


   @kbendick @rdblue 
   We would like to hear the suggestion whether we should wait for this the first new PR(1 of 3) pass the review and then start to post the rest ?
   Or
   Should we just go ahead to post the rest 2 PRs now?
    
    @openinx 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wang-x-xia commented on a change in pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
wang-x-xia commented on a change in pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#discussion_r702441207



##########
File path: dell/src/main/java/org/apache/iceberg/dell/PropertiesSerDes.java
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.dell;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.OutputStreamWriter;
+import java.io.UncheckedIOException;
+import java.nio.charset.StandardCharsets;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Properties;
+import java.util.Set;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * convert Map properties to bytes.
+ */
+public interface PropertiesSerDes {
+
+  /**
+   * version of properties.
+   * <p>
+   * this properties is only a placeholder and may use in future.
+   */
+  String CURRENT_VERSION = "0";

Review comment:
       No, it's in object user metadata. We wish to use it if the properties format needs to change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mechgouki commented on pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
mechgouki commented on pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#issuecomment-913384794


   > @wang-x-xia, if EMC is S3 compatible and uses the v1 S3 API, why not make this an implementation of FileIO that uses the older S3 API? Can you help me understand why this is specific to EMC?
   
   Discussed internally, we plan to switch to DellEMC ECS Client lib here as it is fully open source and we are supporting/will support it 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#issuecomment-911868302


   @wang-x-xia, if EMC is S3 compatible and uses the v1 S3 API, why not make this an implementation of FileIO that uses the older S3 API? Can you help me understand why this is specific to EMC?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wang-x-xia commented on pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
wang-x-xia commented on pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#issuecomment-912218155


   > 
   > 
   > @wang-x-xia, if EMC is S3 compatible and uses the v1 S3 API, why not make this an implementation of FileIO that uses the older S3 API? Can you help me understand why this is specific to EMC?
   
   ECS can append data to object. So the complex data cache of the S3 FileIO impl can be omit. 
   
   The ECSClient interface is for that if Amazon deprecated their V1 SDK, it's easily to create a new implementation of this interface.
   
   Actually, the V2 SDK is same with V1 SDK. But just V1 SDK allow to custom HTTP headers that can help ECS and other object services to provide some extension feature.
   If Iceberg need a V1-SDKed FileIO, I can help to implement one. The object writes will be same with current AWS implementation with a local data cache.
   
   ECS also have its own SDK, but the license may cause some problems. Finally I used the S3 V1 SDK.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wang-x-xia commented on a change in pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
wang-x-xia commented on a change in pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#discussion_r702439409



##########
File path: dell/src/main/java/org/apache/iceberg/dell/EcsClient.java
##########
@@ -0,0 +1,226 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.dell;
+
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Function;
+import org.apache.iceberg.catalog.Namespace;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.NoSuchNamespaceException;
+
+/**
+ * ECS client and its util-methods.
+ */
+public interface EcsClient extends AutoCloseable {
+
+  /**
+   * utils of object key that provide methods to convert {@link ObjectKey} to other objects declared in Iceberg.
+   *
+   * @return utils class
+   */
+  ObjectKeys getKeys();
+
+  /**
+   * utils of properties that provide methods to convert object to properties.
+   *
+   * @return utils class
+   */
+  default PropertiesSerDes getPropertiesSerDes() {

Review comment:
       Fixed!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wang-x-xia commented on a change in pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
wang-x-xia commented on a change in pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#discussion_r702441135



##########
File path: dell/src/main/java/org/apache/iceberg/dell/impl/ContentAndETagImpl.java
##########
@@ -0,0 +1,42 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.dell.impl;
+
+import org.apache.iceberg.dell.EcsClient;
+import org.apache.iceberg.dell.ObjectHeadInfo;
+
+/**
+ * a record class of {@link EcsClient.ContentAndETag}
+ */
+public class ContentAndETagImpl implements EcsClient.ContentAndETag {
+
+  private final ObjectHeadInfo headInfo;
+  private final byte[] content;
+
+  public ContentAndETagImpl(ObjectHeadInfo headInfo, byte[] content) {
+    this.headInfo = headInfo;
+    this.content = content;

Review comment:
       Content is all bytes data of an object. It's for properties objects.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#discussion_r701251390



##########
File path: dell/src/main/java/org/apache/iceberg/dell/PropertiesSerDes.java
##########
@@ -0,0 +1,119 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.dell;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.OutputStreamWriter;
+import java.io.UncheckedIOException;
+import java.nio.charset.StandardCharsets;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Properties;
+import java.util.Set;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * convert Map properties to bytes.
+ */
+public interface PropertiesSerDes {
+
+  /**
+   * version of properties.
+   * <p>
+   * this properties is only a placeholder and may use in future.
+   */
+  String CURRENT_VERSION = "0";

Review comment:
       What is the value of this if it isn't serialized into the blob?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#discussion_r701235348



##########
File path: dell/src/main/java/org/apache/iceberg/dell/ObjectKey.java
##########
@@ -0,0 +1,73 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.dell;
+
+import java.util.Objects;
+
+/**
+ * a immutable record class of object key

Review comment:
       Minor: Please use sentence case in Javadoc. Sentences should start with a capital letter and end with punctuation. Also, it should be "an" when followed by a noun that begins with a vowel.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wang-x-xia commented on a change in pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
wang-x-xia commented on a change in pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#discussion_r702439320



##########
File path: build.gradle
##########
@@ -406,6 +406,7 @@ project(':iceberg-flink-runtime') {
     compile(project(':iceberg-nessie')) {
       exclude group: 'com.google.code.findbugs', module: 'jsr305'
     }
+    compile(project(':iceberg-dell'))

Review comment:
       Removed!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wang-x-xia commented on pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
wang-x-xia commented on pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#issuecomment-951842109


   Using the https://github.com/EMCECS/ecs-object-client-java, the PR is redundant to create an abstraction for ECS APIs.
   
   The following File IO PR is https://github.com/apache/iceberg/pull/3376.
   And the parent PR of ECS Catalog with related discussion is https://github.com/apache/iceberg/pull/2807.
   
   You can check these PRs for more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wang-x-xia commented on pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
wang-x-xia commented on pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#issuecomment-913624007


   @rdblue Here is a problem with using our own SDK. 
   I noticed that https://github.com/EMCECS/ecs-object-client-java uses a BSD-3 license.
   If I use this, what more actions should I take?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#discussion_r701237163



##########
File path: dell/src/main/java/org/apache/iceberg/dell/ObjectKey.java
##########
@@ -0,0 +1,73 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.dell;
+
+import java.util.Objects;
+
+/**
+ * a immutable record class of object key
+ */
+public class ObjectKey {
+  /**
+   * bucket
+   */
+  private final String bucket;
+
+  /**
+   * key
+   */
+  private final String key;
+
+  public ObjectKey(String bucket, String key) {

Review comment:
       Something is not right here. The class is called `ObjectKey`, but it is created by passing a `String` called `key`. Is the string the key, or is it part of the key? Maybe this should be similar to S3, where the bucket/string combination is called a URI, `S3URI`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wang-x-xia commented on a change in pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
wang-x-xia commented on a change in pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#discussion_r702440531



##########
File path: dell/src/main/java/org/apache/iceberg/dell/ObjectKey.java
##########
@@ -0,0 +1,73 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.dell;
+
+import java.util.Objects;
+
+/**
+ * a immutable record class of object key
+ */
+public class ObjectKey {
+  /**
+   * bucket
+   */
+  private final String bucket;
+
+  /**
+   * key
+   */
+  private final String key;
+
+  public ObjectKey(String bucket, String key) {

Review comment:
       In object storage, the object key contains both a bucket name and a key.
   And in my design, the user can provide different delimiters for object storage.
   For example, the user can create an object which bucket is _a_ and key is _folder2021-folder09-obj1_ with delimiter _-_.
   It's hard to format it as a string, so I move this to the `ObjectKeys` interface.
   
   And I need some time to check whether using `emc://bucket/key` as a general string.

##########
File path: dell/src/main/java/org/apache/iceberg/dell/ObjectKey.java
##########
@@ -0,0 +1,73 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.dell;
+
+import java.util.Objects;
+
+/**
+ * a immutable record class of object key
+ */
+public class ObjectKey {
+  /**
+   * bucket
+   */
+  private final String bucket;
+
+  /**
+   * key
+   */
+  private final String key;
+
+  public ObjectKey(String bucket, String key) {
+    if (bucket == null || key == null) {
+      throw new IllegalArgumentException(String.format("bucket %s and key %s must be not null", bucket, key));
+    }
+    this.bucket = bucket;
+    this.key = key;
+  }
+
+  public String getBucket() {
+    return bucket;
+  }
+
+  public String getKey() {
+    return key;
+  }
+
+  @Override
+  public String toString() {

Review comment:
       See above.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wang-x-xia commented on a change in pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
wang-x-xia commented on a change in pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#discussion_r702441077



##########
File path: dell/src/main/java/org/apache/iceberg/dell/impl/EcsAppendOutputStream.java
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.dell.impl;
+
+import com.amazonaws.services.s3.AmazonS3;
+import com.amazonaws.services.s3.Headers;
+import com.amazonaws.services.s3.model.ObjectMetadata;
+import java.io.ByteArrayInputStream;
+import java.io.OutputStream;
+import java.nio.ByteBuffer;
+import org.apache.iceberg.dell.ObjectKey;
+
+/**
+ * use ECS append api to write data
+ */
+public class EcsAppendOutputStream extends OutputStream {

Review comment:
       The `PositionOutputStream` is just a simple extension of `OutputStream`. To implement it in this class is a little wired in OOP. I'll create an adapter class in the next PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#discussion_r701233917



##########
File path: dell/src/main/java/org/apache/iceberg/dell/EcsClient.java
##########
@@ -0,0 +1,226 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.dell;
+
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.function.Function;
+import org.apache.iceberg.catalog.Namespace;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.NoSuchNamespaceException;
+
+/**
+ * ECS client and its util-methods.
+ */
+public interface EcsClient extends AutoCloseable {
+
+  /**
+   * utils of object key that provide methods to convert {@link ObjectKey} to other objects declared in Iceberg.
+   *
+   * @return utils class
+   */
+  ObjectKeys getKeys();
+
+  /**
+   * utils of properties that provide methods to convert object to properties.
+   *
+   * @return utils class
+   */
+  default PropertiesSerDes getPropertiesSerDes() {

Review comment:
       The Iceberg community doesn't use `get` in method names unless something needs to conform to Java Bean expectations. The reason is that `get` is almost never valuable: either there is a more specific verb that should be used instead, or it is basically useless and should be omitted.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mechgouki edited a comment on pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
mechgouki edited a comment on pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#issuecomment-898145724


   @kbendick @rdblue 
   We would like to hear the suggestion whether we should wait for this the first new PR(1 of 3) pass the review and then start to post the rest ?
   Or
   Should we just go ahead to post the rest 2 PRs now?
    
    @openinx 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mechgouki commented on pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
mechgouki commented on pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#issuecomment-916530749


   > @wang-x-xia, the 3-clause BSD license is compatible with the Apache License: https://www.apache.org/legal/resolved.html
   > 
   > It is listed as Category A. So it should be fine.
   > 
   > What is the rationale for using a library specific to EMC rather than the S3 v1 API? It seems like the S3 v1 API is probably going to be useful to more people. Are there EMC features that we can only get using the EMC client? You mentioned append support, but all of the formats that Iceberg uses are immutable. So we never need to re-open a file and append more data.
   
   
   Basically we have 2 areas which Dell EMC features could benefit :
   (1)Append operation in additional of MPU, if the client has less local cache for large object( like edge), Dell EMC object service could help here.
   (2)Atomic rename.  We have If-Match and If-None-Match semantic as we support strong consistency model (within one site) from the very beginning.
   
   And talked internally, for better support the customer, if we use Dell EMC  features, we'd better use our own client with BSD License.  AWS S3 V1 could be workaround here, but V2 can not be a option .
   
   Looking forward for your suggestion here
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#discussion_r701250221



##########
File path: dell/src/main/java/org/apache/iceberg/dell/impl/ContentAndETagImpl.java
##########
@@ -0,0 +1,42 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.dell.impl;
+
+import org.apache.iceberg.dell.EcsClient;
+import org.apache.iceberg.dell.ObjectHeadInfo;
+
+/**
+ * a record class of {@link EcsClient.ContentAndETag}
+ */
+public class ContentAndETagImpl implements EcsClient.ContentAndETag {
+
+  private final ObjectHeadInfo headInfo;
+  private final byte[] content;
+
+  public ContentAndETagImpl(ObjectHeadInfo headInfo, byte[] content) {
+    this.headInfo = headInfo;
+    this.content = content;

Review comment:
       What is content here? Is it an etag?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wang-x-xia commented on a change in pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
wang-x-xia commented on a change in pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#discussion_r702440531



##########
File path: dell/src/main/java/org/apache/iceberg/dell/ObjectKey.java
##########
@@ -0,0 +1,73 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.dell;
+
+import java.util.Objects;
+
+/**
+ * a immutable record class of object key
+ */
+public class ObjectKey {
+  /**
+   * bucket
+   */
+  private final String bucket;
+
+  /**
+   * key
+   */
+  private final String key;
+
+  public ObjectKey(String bucket, String key) {

Review comment:
       In object storage, the object key contains both a bucket name and a key.
   And in my design, the user can provide different delimiters for object storage.
   For example, the user can create an object which bucket is _a_ and key is _folder2021-folder09-obj1_ with delimiter _-_.
   It's hard to format it as a string, so I move this to {ObjectKeys} interface.
   
   And I need some time to check whether using `emc://bucket/key` as a general string.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#discussion_r701249965



##########
File path: dell/src/main/java/org/apache/iceberg/dell/impl/EcsAppendOutputStream.java
##########
@@ -0,0 +1,103 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.iceberg.dell.impl;
+
+import com.amazonaws.services.s3.AmazonS3;
+import com.amazonaws.services.s3.Headers;
+import com.amazonaws.services.s3.model.ObjectMetadata;
+import java.io.ByteArrayInputStream;
+import java.io.OutputStream;
+import java.nio.ByteBuffer;
+import org.apache.iceberg.dell.ObjectKey;
+
+/**
+ * use ECS append api to write data
+ */
+public class EcsAppendOutputStream extends OutputStream {

Review comment:
       Why doesn't this implement `PositionOutputStream`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] mechgouki commented on pull request #2847: [2806]Dell EMC ECS features required by a new ECS Catalog impl

Posted by GitBox <gi...@apache.org>.
mechgouki commented on pull request #2847:
URL: https://github.com/apache/iceberg/pull/2847#issuecomment-898145724


   @kbendick @rdblue 
   We would like to hear the suggestion whether we should wait for this the first new PR(1 of 3) pass the review and then start to post the rest ?
   Or
   Should we just go ahead to post the rest 2 PRs now?
    
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org