You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2023/09/29 08:53:00 UTC

[jira] [Commented] (IMPALA-11501) Add flag to allow metadata-cache operations on masked tables

    [ https://issues.apache.org/jira/browse/IMPALA-11501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770339#comment-17770339 ] 

Quanlong Huang commented on IMPALA-11501:
-----------------------------------------

Another thing this Jira should take care of is not triggering metadata loading for INVALIDATE METADATA <table> statement when "allow_refresh_by_masked_users" is set to true.

Before we have IMPALA-10554 and IMPALA-11281, INVALIDATE METADATA <table> doesn't trigger metadata loading no matter in the legacy catalog mode or in the local catalog mode. It is supposed to finish fast and won't be blocked by concurrent DDLs.

After we have IMPALA-10554, the authorization check on the INVALIDATE/REFRESH request will get the column info, which will trigger metadata loading in local catalog mode. Code snipper:
{code:java}
private void authorizePrivilegeRequest(AuthorizationContext authzCtx,
    AnalysisResult analysisResult, FeCatalog catalog, PrivilegeRequest request)
    throws AuthorizationException, InternalException {
  Preconditions.checkNotNull(request);
  String dbName = null;
  if (request.getAuthorizable() != null) {
    dbName = request.getAuthorizable().getDbName();
  }
  // If this is a system database, some actions should always be allowed
  // or disabled, regardless of what is in the auth policy.
  if (dbName != null && checkSystemDbAccess(catalog, dbName, request.getPrivilege())) {
    return;
  }
  // Populate column names to check column masking policies in blocking updates.
  if (config_.isEnabled() && request.getAuthorizable() != null
      && request.getAuthorizable().getType() == Type.TABLE) {
    Preconditions.checkNotNull(dbName);
    AuthorizableTable authorizableTable = (AuthorizableTable) request.getAuthorizable();
    FeDb db = catalog.getDb(dbName);
    if (db != null) {
      // 'db', 'table' could be null for an unresolved table ref. 'table' could be
      // null for target table of a CTAS statement. Don't need to populate column
      // names in such cases since no column masking policies will be checked.
      FeTable table = db.getTable(authorizableTable.getTableName());  // <---- This will trigger metadata loading in local catalog mode
      if (table != null && !(table instanceof FeIncompleteTable)) {
        authorizableTable.setColumns(table.getColumnNames());
      }
    }
  }
  checkAccess(authzCtx, analysisResult.getAnalyzer().getUser(), request);
}{code}
[https://github.com/apache/impala/blob/2baed42/fe/src/main/java/org/apache/impala/authorization/BaseAuthorizationChecker.java#L226]

In local catalog mode, if the table meta is not cached locally, the call on db.getTable() sends a getPartialCatalogObject request to catalogd, which could be blocked in two places:
 * if the table is also unloaded in catalogd, it triggers metadata loading and should wait for that.
 * if the table is locked by a concurrent DDL/DML, it should wait since it requires the table read lock.

These make INVALIDATE METADATA runs slow on large tables or tables that have frequent DDL/DMLs.

In the legacy catalog mode, db.getTable() just returns what is in the cache. For an unloaded table, it returns an IncompleteTable object, which has no column info and leads to the bug described inĀ IMPALA-11281. IMPALA-11281 fixes the bug by forcing a metadata loading for INVALIDATE/REFRESH commands.

So for branches that have IMPALA-10554, INVALIDATE METADATA <table> could be blocked in the above two places in local catalog mode. For branches that have both IMPALA-10554 and IMPALA-11281, INVALIDATE METADATA <table> could be blocked in both catalog modes.

Introducing the flag of "allow_refresh_by_masked_users" is to bring back the behavior of INVALIDATE/REFRESH before we have IMPALA-10554. We should also bring back the same performance on INVALIDATE.

> Add flag to allow metadata-cache operations on masked tables
> ------------------------------------------------------------
>
>                 Key: IMPALA-11501
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11501
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Security
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> "REFRESH <table>" and "INVALIDATE METADATA <table>" are the table level metadata-cache operations that only used in Impala (not Hive, SparkSQL or else).
> In Hive-Ranger plugin, when a table is masked (either by column-masking or row-filtering policy) for a user, the user can't perform any modification (insert/delete/update) on the table (RANGER-1087, RANGER-1100). However, Hive doesn't have those metadata-cache operations. It's a grey area whether we should block them or not.
> Currently, Impala blocks metadata-cache operations as well (IMPALA-10554, IMPALA-11281). However, it's possible that, before upgrade, some data-consumer jobs already have REFRESH in them. It'd be better to have a flag to allow such operations for smooth upgrade process.
> The flag can be something like "allow_refresh_by_masked_users".
> CC [~fangyurao], [~csringhofer]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org