You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/12/06 23:14:16 UTC

[GitHub] [iceberg] dmgcodevil opened a new issue, #6370: What is the purpose of Hive Lock ?

dmgcodevil opened a new issue, #6370:
URL: https://github.com/apache/iceberg/issues/6370

   ### Query engine
   
   Flink/Spark/Trino
   
   ### Question
   
   What is the purpouse of Hive Lock ? is it possible to replace it with an atomic operation: replace.(expectedMetadata, newMedata) ? Hive metastore is backed by a sql db wich supports transactions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #6370: What is the purpose of Hive Lock ?

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1631669765

   This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #6370: What is the purpose of Hive Lock ?

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1341372808

   @InvisibleProgrammer: What do the Hive guys think about this? Would they be interested in adding this feature to HMS?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dmgcodevil commented on issue #6370: What is the purpose of Hive Lock ?

Posted by GitBox <gi...@apache.org>.
dmgcodevil commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1342997287

   Does the JdbcCtalog support atomic updates w/o locks? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #6370: What is the purpose of Hive Lock ?

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1356061017

   @InvisibleProgrammer, @TuroczyX: any news about the atomic lock?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #6370: What is the purpose of Hive Lock ?

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1341067753

   @dmgcodevil: The purpose of the Hive Lock is to make sure that there are no concurrent changes to the table. Specifically that there is no concurrent Iceberg commit.
   
   In theory the replace is a good idea, but in an ideal world that is a new feature in HMS which could be only supported in the upcoming releases (4.0.0). Most of the Iceberg community uses older 3.x Hive (HMS) version, or even older 2.x some cases. Adding a new feature there is usually not allowed.
   
   Just from the Iceberg perspective this change would be very much needed and welcome as it would mean faster/more reliable commit flow for HiveCatalogs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dmgcodevil commented on issue #6370: What is the purpose of Hive Lock ?

Posted by GitBox <gi...@apache.org>.
dmgcodevil commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1379151268

   I still don't understand why we need locks if we have transactions and we can implement `optimistic locking` model. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #6370: What is the purpose of Hive Lock ?

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1379883500

   With the apache/hive#3888 we can implement a solution which will handle failures the same way as the current one, without using locks, and depending on the `alter_table` to fail.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] TuroczyX commented on issue #6370: What is the purpose of Hive Lock ?

Posted by GitBox <gi...@apache.org>.
TuroczyX commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1341382512

   It is definitely something that we need to consider. We will talk about it on our next meeting. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #6370: What is the purpose of Hive Lock ?

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1371847428

   @dmgcodevil: after the Hive PR we need to add a table property for Iceberg tables which will do 2 things:
   - Adds the new `expected_parameter_key` and `expected_parameter_value` properties to the `alter_table` context
   - Turns off the lock acquisition and heartbeating.
   
   This would work only in the following situation:
   - HMS server contains the new Hive change
   - All of the Iceberg writers contain the new Iceberg change (the one which turns off the locks and adds the new parameters to the alter table call)
   
   In every other situation, the old locking should be used


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dmgcodevil commented on issue #6370: What is the purpose of Hive Lock ?

Posted by GitBox <gi...@apache.org>.
dmgcodevil commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1363118516

   @pvary is it still based on locks?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #6370: What is the purpose of Hive Lock ?

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1380107713

   Created #6570 for removing the locks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #6370: What is the purpose of Hive Lock ?

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1346635105

   @InvisibleProgrammer: Fair enough. If it would have been a simple question, I might not have asked 😄 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dmgcodevil commented on issue #6370: What is the purpose of Hive Lock ?

Posted by GitBox <gi...@apache.org>.
dmgcodevil commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1371833337

   @pvary I mean, does it solve the problem with orphan locks? Also, we see a lot of errors thrown [here](https://github.com/apache/iceberg/blob/master/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L294), but I don't understand how it's possible. Is it b/c some other process released the lock and update a table? Nonetheless, can get rid of locks, i.e. this [line](https://github.com/apache/iceberg/blob/master/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L266)?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dmgcodevil commented on issue #6370: What is the purpose of Hive Lock ?

Posted by GitBox <gi...@apache.org>.
dmgcodevil commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1371892856

   I see, it will be an optimistic locking:  
   
   ```
   1.  set  properties  
   2.  send a request to HMS
   3. if failed ->  got to step  1
   4. if succeded -> exit
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] hililiwei commented on issue #6370: What is the purpose of Hive Lock ?

Posted by GitBox <gi...@apache.org>.
hililiwei commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1343786725

   > Just from the Iceberg perspective this change would be very much needed and welcome as it would mean faster/more reliable commit flow for HiveCatalogs
   
   +1. If we could do that, that would be great. We sometimes encounter unexpected issues with Hive Lock in production.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #6370: What is the purpose of Hive Lock ?

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1356124377

   Maybe something like this:
   ```
   diff --git a/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift b/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
   index 179a4960b9..6a3f9e40fa 100644
   --- a/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
   +++ b/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
   @@ -22,7 +22,7 @@
    # Thrift Service that the MetaStore is built on
    #
    
   -include "share/fb303/if/fb303.thrift"
   +include "/Users/petervary/tmp/fb303.thrift"
    
    namespace java org.apache.hadoop.hive.metastore.api
    namespace php metastore
   @@ -2165,7 +2165,9 @@ struct AlterTableRequest {
      6: optional i64 writeId=-1,
      7: optional string validWriteIdList
      8: optional list<string> processorCapabilities,
   -  9: optional string processorIdentifier
   +  9: optional string processorIdentifier,
   +  10: optional string expectedPropertyKey,
   +  12: optional string expectedPropertyValue
    // TODO: also add cascade here, out of envCtx
    }
   
   diff --git a/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
   index 1226cd1a1a..0b12e68401 100644
   --- a/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
   +++ b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java
   @@ -102,7 +102,7 @@ public void setConf(Configuration conf) {
      @Override
      public void alterTable(RawStore msdb, Warehouse wh, String catName, String dbname,
          String name, Table newt, EnvironmentContext environmentContext,
   -      IHMSHandler handler, String writeIdList)
   +      IHMSHandler handler, String writeIdList, String expectedKey, String expectedValue)
              throws InvalidOperationException, MetaException {
        catName = normalizeIdentifier(catName);
        name = name.toLowerCase();
   @@ -187,6 +187,11 @@ public void alterTable(RawStore msdb, Warehouse wh, String catName, String dbnam
                TableName.getQualified(catName, dbname, name) + " doesn't exist");
          }
    
   +      if (expectedKey != null && !oldt.getParameters().get(expectedKey).equals(expectedValue)) {
   +        throw new MetaException("The table already modified. The parameter value for key: " + expectedKey + " is "
   +                + oldt.getParameters().get(expectedKey) + ". The expected was value was " + expectedValue);
   +      }
   +
          validateTableChangesOnReplSource(olddb, oldt, newt, environmentContext);
    
          // On a replica this alter table will be executed only if old and new both the databases are
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] InvisibleProgrammer commented on issue #6370: What is the purpose of Hive Lock ?

Posted by GitBox <gi...@apache.org>.
InvisibleProgrammer commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1346382736

   @pvary I think it is not  a simple yes or no question. We need some time to better understand the topic and the consequences of the change.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #6370: What is the purpose of Hive Lock ?

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1656974399

   This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] closed issue #6370: What is the purpose of Hive Lock ?

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #6370: What is the purpose of Hive Lock ?
URL: https://github.com/apache/iceberg/issues/6370


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #6370: What is the purpose of Hive Lock ?

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1364137333

   Why would it be based on locks? HMS changes inside a HMS call are transactional.
   This is essential, otherwise we could end up in a situation where concurrent HMS changes could push the Metastore into an inconsistent state.
   Using this transaction we could check and set the snapshot of the Iceberg table


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #6370: What is the purpose of Hive Lock ?

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1361628513

   Created the Hive jira: https://issues.apache.org/jira/browse/HIVE-26882
   And the PR: https://github.com/apache/hive/pull/3888


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #6370: What is the purpose of Hive Lock ?

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1369791961

   @dmgcodevil: It turns out you have been right. Even though we do the changes in transaction on HMS side, the isolation level we use for those changes is `READ_COMMITTED`. This does not prevent reading old data and writing out the new ones concurrently.
   
   Thanks for the feedback! I updated the Hive PR to use `REPEATABLE_READ` which would prevent the situation and act as a lock for our case.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #6370: What is the purpose of Hive Lock ?

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #6370:
URL: https://github.com/apache/iceberg/issues/6370#issuecomment-1371906638

   In this regard, this will be the same as with normal locks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org