You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/09/07 23:26:33 UTC

[GitHub] [iceberg] szehon-ho opened a new issue #3087: Implement retryDelays and retries in Iceberg Hive Client

szehon-ho opened a new issue #3087:
URL: https://github.com/apache/iceberg/issues/3087


   Iceberg uses its own ClientPoolImpl to implement retries.  However, it lacks fine-grained retry configs like:  retries and retry-backoff.  It may make sense to implement this in high throughput environments to squeeze more out of Hive Metastores.
   
   One solution is to use Hive's native RetryingMetastoreClient: https://github.com/apache/iceberg/pull/2804 but it seems to conflict a bit with some existing code.
   
   Another solution is to continue own retrying logic in ClientPoolImpl and implement backoff and retries there, which is probably simpler at this point.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on issue #3087: Implement retryDelays and retries in Iceberg Hive Client

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on issue #3087:
URL: https://github.com/apache/iceberg/issues/3087#issuecomment-953518441


   Issue is fixed with #3099 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on issue #3087: Implement retryDelays and retries in Iceberg Hive Client

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #3087:
URL: https://github.com/apache/iceberg/issues/3087#issuecomment-917144606


   I don't have a strong opinion here, so if using the Hive client is better we can definitely explore it. We'll want to make sure that it doesn't require any additional dependencies and we'll probably also want to turn off our retries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho edited a comment on issue #3087: Implement retryDelays and retries in Iceberg Hive Client

Posted by GitBox <gi...@apache.org>.
szehon-ho edited a comment on issue #3087:
URL: https://github.com/apache/iceberg/issues/3087#issuecomment-917078226


   Thanks for the replies guys.  I started working on a POC for Peter's option: https://github.com/apache/iceberg/pull/3099  
   
   I also see many advantages of using Hive's native [RetryingMetaStoreClient](https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java).  To repeat some of his and also add some of my own:
   
   * RetryingMetaStoreClient is used today in Hive and Spark, and is more mature with all known Hive failure types. HiveClientPool will have to catch up to all the Hive exception types to retry: https://github.com/apache/iceberg/pull/2844 for some missing exceptions, for example.
   * RetryingMetaStoreClient handles UGI impersonation logic for reconnect (needed in Kerberized environments)
   * ClientPoolImpl does not support any configuration of retry and retry-backoff. It hard-codes to 1 retry and no backoff, which may not be ideal in all cases (the default in RetryingMetaStoreClient is 3 retries and 1s backoff for instance).
   * For user experience, using RetryingMetaStoreClient would unify all Hive configs.  For instance in Spark, setting 'spark.hadoop.hive.metastore.client.connect.retry.delay' will set it for all Hive connections (for Iceberg and non-Iceberg tables).  Whereas if we add configs to ClientPool, it will be per catalog level, which may not make sense to user, and the config flags will be different than what they are used to.
   
   @rdblue  can you see if these justifications make sense, or if you guys still feel strongly about adding more logic in ClientPool instead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho edited a comment on issue #3087: Implement retryDelays and retries in Iceberg Hive Client

Posted by GitBox <gi...@apache.org>.
szehon-ho edited a comment on issue #3087:
URL: https://github.com/apache/iceberg/issues/3087#issuecomment-917078226


   Thanks for the replies guys.  I started working on a POC for Peter's option: https://github.com/apache/iceberg/issues/3087.  
   
   I also see many advantages of using Hive's native [RetryingMetaStoreClient](https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java).  To repeat some of his and also add some of my own:
   
   * RetryingMetaStoreClient is used today in Hive and Spark, and is more mature with all known Hive failure types. HiveClientPool will have to catch up to all the Hive exception types to retry: https://github.com/apache/iceberg/pull/2844 for some missing exceptions, for example.
   * RetryingMetaStoreClient handles UGI impersonation logic for reconnect (needed in Kerberized environments)
   * ClientPoolImpl does not support any configuration of retry and retry-backoff. It hard-codes to 1 retry and no backoff, which may not be ideal in all cases (the default in RetryingMetaStoreClient is 1s backoff for instance).
   * For user experience, using RetryingMetaStoreClient would unify all Hive configs.  For instance in Spark, setting 'spark.hadoop.hive.metastore.client.connect.retry.delay' will set it for all Hive connections (for Iceberg and non-Iceberg tables).  Whereas if we add configs to ClientPool, it will be per catalog level, which may not make sense to user, and the config flags will be different than what they are used to.
   
   @rdblue  can you see if these justifications make sense, or if you guys still feel strongly about adding more logic in ClientPool instead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho edited a comment on issue #3087: Implement retryDelays and retries in Iceberg Hive Client

Posted by GitBox <gi...@apache.org>.
szehon-ho edited a comment on issue #3087:
URL: https://github.com/apache/iceberg/issues/3087#issuecomment-915401031


   @pvary thanks, yea that makes sense.  Looking at it, the only small problem is that ClientPool would add one more layer of retry and so that probably should be removed based on this approach, and if other client pools implementations are planned that would have taken advantage of retry at that layer (probably should disable the ClientPool layer retry just for Hive then)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #3087: Implement retryDelays and retries in Iceberg Hive Client

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #3087:
URL: https://github.com/apache/iceberg/issues/3087#issuecomment-916738242


   The `RetryingMetaStoreClient` does the following:
   - Retries the failed request based on the Hive logic ([try-catch](https://github.com/apache/hive/blob/8f4f3b90fa5987e82025ecf81f8084c90130fd6b/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java#L223-L253), [isRecoverableMetaException](https://github.com/apache/hive/blob/8f4f3b90fa5987e82025ecf81f8084c90130fd6b/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java#L267-L276))
   - Limits the number of retries
   - Delays the retries
   - Handles connection lifetimes
   - Handles expired keytab users
   - Provides a way to collect HMS response times


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on issue #3087: Implement retryDelays and retries in Iceberg Hive Client

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on issue #3087:
URL: https://github.com/apache/iceberg/issues/3087#issuecomment-914691440


   @pvary @rymurr  @rdblue do you guys have thoughts on it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho edited a comment on issue #3087: Implement retryDelays and retries in Iceberg Hive Client

Posted by GitBox <gi...@apache.org>.
szehon-ho edited a comment on issue #3087:
URL: https://github.com/apache/iceberg/issues/3087#issuecomment-915401031


   @pvary thanks, yea that makes sense.  Looking at it, the only small problem is that ClientPool would add one more layer of retry and so that probably should be removed based on this approach, and if other client pools implementations are planned that would have taken advantage of retry at that layer (probably should make the ClientPool layer retry configurable then)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on issue #3087: Implement retryDelays and retries in Iceberg Hive Client

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on issue #3087:
URL: https://github.com/apache/iceberg/issues/3087#issuecomment-915401031


   @pvary thanks, yea that makes sense.  Looking at it, the only small problem is that ClientPool would add one more layer of retry and so that probably should be removed based on this approach, and if other client pools implementations are planned that would take advantage of retry at that layer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on issue #3087: Implement retryDelays and retries in Iceberg Hive Client

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on issue #3087:
URL: https://github.com/apache/iceberg/issues/3087#issuecomment-917078226


   Thanks for the replies guys.  I started working on a POC for Peter's option: https://github.com/apache/iceberg/issues/3087.  
   
   I also see many advantages of using Hive's native [RetryingMetaStoreClient](https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java).  To repeat some of his and also add some of my own:
   
   * RetryingMetaStoreClient is the standard in Hive and Spark, and is more mature with all known Hive failure types. HiveClientPool will have to catch up to all the Hive exception types to retry: https://github.com/apache/iceberg/pull/2844 for some missing exceptions, for example.
   * RetryingMetaStoreClient handles UGI impersonation logic for reconnect (needed in Kerberized environments)
   * ClientPoolImpl does not support any configuration of retry and retry-backoff. It hard-codes to 1 retry and no backoff, which may not be ideal in all cases (the default in RetryingMetaStoreClient is 1s backoff for instance).
   * For user experience, using RetryingMetaStoreClient would unify all Hive configs.  For instance in Spark, setting 'spark.hadoop.hive.metastore.client.connect.retry.delay' will set it for all Hive connections (for Iceberg and non-Iceberg tables).  Whereas if we add configs to ClientPool, it will be per catalog level, which may not make sense to user, and the config flags will be different than what they are used to.
   
   @rdblue  can you see if these justifications make sense, or if you guys still feel strongly about adding more logic in ClientPool instead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho closed issue #3087: Implement retryDelays and retries in Iceberg Hive Client

Posted by GitBox <gi...@apache.org>.
szehon-ho closed issue #3087:
URL: https://github.com/apache/iceberg/issues/3087


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #3087: Implement retryDelays and retries in Iceberg Hive Client

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #3087:
URL: https://github.com/apache/iceberg/issues/3087#issuecomment-915059716


   @szehon-ho: I consider it a tech debt that we are using `HiveMetastoreClient` in Iceberg instead of `IMetaStoreClient`. So I would prefer a solution where we start to use `IMetaStoreClient`, and the `RetryingMetastoreClient`, but I am clearly Hive biased 😄, so others might have different thoughts


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on issue #3087: Implement retryDelays and retries in Iceberg Hive Client

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #3087:
URL: https://github.com/apache/iceberg/issues/3087#issuecomment-916511289


   I'd probably prefer having our own retry logic that we can standardize on across uses of ClientPool. But I also don't really know what benefits there are to use `IMetaStoreClient` and similar. We chose the first thing that worked.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho edited a comment on issue #3087: Implement retryDelays and retries in Iceberg Hive Client

Posted by GitBox <gi...@apache.org>.
szehon-ho edited a comment on issue #3087:
URL: https://github.com/apache/iceberg/issues/3087#issuecomment-917078226


   Thanks for the replies guys.  I started working on a POC for Peter's option: https://github.com/apache/iceberg/pull/3099  
   
   I also see many advantages of using Hive's native [RetryingMetaStoreClient](https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java).  To repeat some of his and also add some of my own:
   
   * RetryingMetaStoreClient is used today in Hive and Spark, and is more mature with all known Hive failure types. HiveClientPool will have to catch up to all the Hive exception types to retry: https://github.com/apache/iceberg/pull/2844 for some missing exceptions, for example.
   * RetryingMetaStoreClient handles UGI impersonation logic for reconnect (needed in Kerberized environments)
   * ClientPoolImpl does not support any configuration of retry and retry-backoff. It hard-codes to 1 retry and no backoff, which may not be ideal in all cases (the default in RetryingMetaStoreClient is 1s backoff for instance).
   * For user experience, using RetryingMetaStoreClient would unify all Hive configs.  For instance in Spark, setting 'spark.hadoop.hive.metastore.client.connect.retry.delay' will set it for all Hive connections (for Iceberg and non-Iceberg tables).  Whereas if we add configs to ClientPool, it will be per catalog level, which may not make sense to user, and the config flags will be different than what they are used to.
   
   @rdblue  can you see if these justifications make sense, or if you guys still feel strongly about adding more logic in ClientPool instead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org