You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@sentry.apache.org by "Anthony Young-Garner (Jira)" <ji...@apache.org> on 2020/06/11 22:49:00 UTC

[jira] [Created] (SENTRY-2556) Provide prefer local option to improve performance when Hive on S3 is used conjunction with Sentry HA and Sentry-HDFS sync

Anthony Young-Garner created SENTRY-2556:
--------------------------------------------

             Summary: Provide prefer local option to improve performance when Hive on S3 is used conjunction with Sentry HA and Sentry-HDFS sync
                 Key: SENTRY-2556
                 URL: https://issues.apache.org/jira/browse/SENTRY-2556
             Project: Sentry
          Issue Type: Improvement
          Components: Sentry
    Affects Versions: 2.0.0
            Reporter: Anthony Young-Garner


Performance degradation occurs when 1) the Hive Metastore Server is connected (via Sentry client) to a remote Sentry Server and 2) the HiveServer2 is connected (via Sentry client) to a local Sentry Server and when Hive on S3 is used conjunction with Sentry HA and Sentry-HDFS sync.

TO REPRODUCE:
 # Setup Sentry HA with HDFS sync
 # Configure Hive and HDFS to use S3
 # Create an external table in s3

EXAMPLE: CREATE EXTERNAL TABLE mytesttable (firstname STRING, lastname STRING, address STRING, city STRING, state STRING, zip int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 's3a://ajy-sentry/';

RESULT: Creating a table in s3 can take a very long time (two orders of magnitude slower than table creation in HDFS). Note that it won't always occur (see below for more detail

To force a test system into the condition that causes the performance degradation: 
 # For each HiveServer2 instance, setting the sentry.service.client.server.rpc-addresses property to one value (local to the HiveServer2 instance) and then restarting that HiveServer2 instance
 # For each HMS instance, setting the sentry.service.client.server.rpc-addresses property to one value (remote to the HMS instance) and then restarting that HMS instance

-------------

I think the needed code change would be to provide a _prefer local_ option on the SentryTransportPool and/or the SentryGenericServiceClientDefaultImpl so that when the HMS is on the same node as one of the Sentry servers, that the local Sentry server is used. Testing would need to be performed to determine whether this should become normal behavior or should be user-configurable for specific situations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)