You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/12/22 19:39:31 UTC

[GitHub] [iceberg] jackye1995 commented on a change in pull request #3663: fix data loss in concurrent appending with HadoopTable

jackye1995 commented on a change in pull request #3663:
URL: https://github.com/apache/iceberg/pull/3663#discussion_r774122597



##########
File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
##########
@@ -105,6 +108,8 @@ public void initialize(String name, Map<String, String> properties) {
     this.fileIO = fileIOImpl == null ? new HadoopFileIO(conf) : CatalogUtil.loadFileIO(fileIOImpl, properties, conf);
 
     this.suppressPermissionError = Boolean.parseBoolean(properties.get(HADOOP_SUPPRESS_PERMISSION_ERROR));
+
+    this.lockManager = LockManagers.from(properties);

Review comment:
       I don't think we should do this for `HadoopCatalog`, because hadoop catalog users usually can leverage Hadoop file system's mutual exclusive writing feature and does not need a pessimistic lock to do commit. I think we should allow `LockManager` to be null in `HadoopTableOperations`, and only if it is configured, we use it. 
   
   This probably also means we need another config property `lock-enabled`.

##########
File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopTables.java
##########
@@ -194,8 +203,24 @@ TableOperations newTableOps(String location) {
     if (location.contains(METADATA_JSON)) {
       return new StaticTableOperations(location, new HadoopFileIO(conf));
     } else {
-      return new HadoopTableOperations(new Path(location), new HadoopFileIO(conf), conf);
+      return new HadoopTableOperations(new Path(location), new HadoopFileIO(conf), conf,
+              createOrGetLockManager(location));
+    }
+  }
+
+  private LockManager createOrGetLockManager(String location) {
+    Map<String, String> properties = Maps.newHashMap();
+    Iterator<Map.Entry<String, String>> configEntries = conf.iterator();
+    while (configEntries.hasNext()) {
+      Map.Entry<String, String> entry = configEntries.next();
+      String key = entry.getKey();
+      if (key.startsWith("iceberg.lock.")) {

Review comment:
       "iceberg." should be a public static variable because a process should import this variable to construct lock properties. And I wonder if we should use something more complex to avoid namespace collision, such as "iceberg.tables.hadoop."

##########
File path: core/src/main/java/org/apache/iceberg/util/LockManager.java
##########
@@ -17,14 +17,14 @@
  * under the License.
  */
 
-package org.apache.iceberg.aws.glue;
+package org.apache.iceberg.util;

Review comment:
       why is this in the `util` package? If we want to properly support it, it feels more nature to me to add this to `iceberg-api`.

##########
File path: aws/src/integration/java/org/apache/iceberg/aws/glue/TestDynamoLockManager.java
##########
@@ -153,9 +153,9 @@ public void testAcquireSingleProcess() throws Exception {
   @Test
   public void testAcquireMultiProcessAllSucceed() throws Exception {
     lockManager.initialize(ImmutableMap.of(
-        CatalogProperties.LOCK_ACQUIRE_INTERVAL_MS, "500",
-        CatalogProperties.LOCK_ACQUIRE_TIMEOUT_MS, "100000000",
-        CatalogProperties.LOCK_TABLE, lockTableName
+        LockManagerProperties.LOCK_ACQUIRE_INTERVAL_MS, "500",
+        LockManagerProperties.LOCK_ACQUIRE_TIMEOUT_MS, "100000000",
+        LockManagerProperties.LOCK_TABLE, lockTableName

Review comment:
       These properties cannot change class for backwards compatibility. 
   
   And I don't think it breaks definition of `CatalogProperties` to keep these configurations there, because there are 2 places using this, (1) hadoop catalog, (2) hadoop tables. For (1), it's still in a catalog, and for (2), you are technically not using these properties directly, but only using it with a pre-defined prefix. That is a custom definition in `HadoopTables` to use these properties in the specific way, I don't think we need to also move properties to another class to accommodate this use case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org