You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zhou Xu (Code Review)" <ge...@cloudera.org> on 2019/12/05 06:23:58 UTC

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Zhou Xu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14846


Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................

IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

If Sentry authorization is enabled, users with multi group-policies
will take time to get the result of 'show tables/databases'. It seems
that ResourceAuthorizationProvider.hasAccess performs bad for users
with complex group-policies.

This patch enables MT_DOP query option to accelerate 'show tables/databases'
with multithreaded execution. The time cost for users with complex group-policies
can be reduced from 65.886 seconds to 4.752 seconds when getting 160 avaliable
tables from 910 tables with MT_DOP=16.

Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
---
M be/src/service/client-request-state.cc
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/impala-http-handler.cc
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/main/java/org/apache/impala/service/MetadataOp.java
M fe/src/test/java/org/apache/impala/authorization/AuthorizationTest.java
9 files changed, 160 insertions(+), 63 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/14846/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 1
Gerrit-Owner: Zhou Xu <pe...@gmail.com>

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 11: Code-Review+1

Patch looks good to me. I will let other reviewers take a look at this as well.


-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 11
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Fri, 20 Dec 2019 18:12:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5325/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 10
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Fri, 20 Dec 2019 11:13:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14846/7/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/14846/7/fe/src/main/java/org/apache/impala/service/Frontend.java@859
PS7, Line 859: d(dbNa
> Using MoreExecutors.sameThreadExecutor seems much better.
About tests: yes, tests/authorization/test_authorization.py is a good place to add a test for this new flag. All the tests there are "custom cluster tests", so they will restart the impala deamons with flags specified in "impalad_args".


http://gerrit.cloudera.org:8080/#/c/14846/9/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/14846/9/fe/src/main/java/org/apache/impala/service/Frontend.java@866
PS9, Line 866:           new CheckAuthorization(dbName, tblName, tableOwner, user)));
nit: +2 indentation


http://gerrit.cloudera.org:8080/#/c/14846/9/fe/src/main/java/org/apache/impala/service/Frontend.java@997
PS9, Line 997:           new CheckAuthorization(db.getName(), null, db.getOwnerUser(), user)));
nit: +2 indentation



-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 9
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Wed, 18 Dec 2019 13:35:17 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Zhou Xu (Code Review)" <ge...@cloudera.org>.
Zhou Xu has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................

IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

If Sentry authorization is enabled, users with multi group-policies
will take time to get the result of 'show tables/databases'. It seems
that ResourceAuthorizationProvider.hasAccess performs bad for users
with complex group-policies, IMPALA-9242 will target to address this
problem.

This patch provides a config option 'num_check_access_threads' to accelerate
'show tables/databases' by using multithreading. This configuration is
applicable only when authorization is enabled. A value of 1 disables
multi-threaded execution for checking access. The value must be in the range
of 1 to 32. The default value of 'num_check_access_threads' is 1.

Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
---
M be/src/common/global-flags.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
5 files changed, 115 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/14846/6
-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 6
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................

IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

If Sentry authorization is enabled, users with multi group-policies
will take time to get the result of 'show tables/databases'. It seems
that ResourceAuthorizationProvider.hasAccess performs bad for users
with complex group-policies, IMPALA-9242 will target to address this
problem.

This patch provides a config option 'num_check_authorization_threads' to
accelerate 'show tables/databases' by using multithreading. This configuration
is applicable only when authorization is enabled. A value of 1 disables
multi-threaded execution for checking access. However, a small value of larger
than 1 may limit the parallism of FE requests when checking authorization with
a high concurrency. The value must be in the range of 1 to 128. The default
value of 'num_check_access_threads' is 1.

Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Reviewed-on: http://gerrit.cloudera.org:8080/14846
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/common/global-flags.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/authorization/test_authorization.py
6 files changed, 144 insertions(+), 20 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 17
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 12:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5331/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 12
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Tue, 24 Dec 2019 12:00:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14846/7/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/14846/7/fe/src/main/java/org/apache/impala/service/Frontend.java@859
PS7, Line 859: submit
> I am ok with the static thread pool if we ensure that we do not regress too
Sorry couldn't review earlier. Yes, by default if the thread pool size is one we should rely on the currentThread to do the access check, otherwise we are essentially creating a bottleneck for all client requests. Having a static threadpool still is helpful if the max concurrency is lower that the threadpool size.



-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 7
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Tue, 17 Dec 2019 21:13:23 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Zhou Xu (Code Review)" <ge...@cloudera.org>.
Zhou Xu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14846/7/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/14846/7/fe/src/main/java/org/apache/impala/service/Frontend.java@859
PS7, Line 859: submit
> I am not familiar with ExecutorService, so I may be wrong here.
You are right. All the check tasks created concurrently by the users when executing 'show databases/tables' will be submitted to the static thread pool. All the fe_service_threads used by users when executing 'show databases/tables' will be blocked until check tasks being finished. For the current state, 'show databases/ tables' will be executed in separate client request thread. The method of newFixedThreadPool sets corePoolSize and maximumPoolSize to the same value. Now the problem changes to how to set the corePoolSize and maximumPoolSize for ThreadPoolExecutor. Any suggestion ?



-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 7
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Tue, 17 Dec 2019 12:52:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5265/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 2
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Thu, 12 Dec 2019 08:38:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5289/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 6
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Mon, 16 Dec 2019 08:17:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Zhou Xu (Code Review)" <ge...@cloudera.org>.
Zhou Xu has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................

IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

If Sentry authorization is enabled, users with multi group-policies
will take time to get the result of 'show tables/databases'. It seems
that ResourceAuthorizationProvider.hasAccess performs bad for users
with complex group-policies, IMPALA-9242 will target to address this
problem.

This patch provides a config option 'num_check_access_threads' to accelerate
'show tables/databases' by using multithreading. This configuration is
applicable only when authorization is enabled. A value of 1 disables
multi-threaded execution for checking access. However, a small value of larger
than 1 may limit the parallism of FE requests when checking authorization with
a high concurrency. The value must be in the range of 1 to 128. The default
value of 'num_check_access_threads' is 1.

Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
---
M be/src/common/global-flags.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/authorization/test_authorization.py
6 files changed, 142 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/14846/12
-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 12
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Zhou Xu (Code Review)" <ge...@cloudera.org>.
Zhou Xu has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................

IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

If Sentry authorization is enabled, users with multi group-policies
will take time to get the result of 'show tables/databases'. It seems
that ResourceAuthorizationProvider.hasAccess performs bad for users
with complex group-policies.

This patch provides a config option 'num_check_access_threads' to accelerate
'show tables/databases' by using multithreading. The default value of
'num_check_access_threads' is 16.

Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
---
M be/src/common/global-flags.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
5 files changed, 97 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/14846/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 2
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Zhou Xu (Code Review)" <ge...@cloudera.org>.
Zhou Xu has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................

IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

If Sentry authorization is enabled, users with multi group-policies
will take time to get the result of 'show tables/databases'. It seems
that ResourceAuthorizationProvider.hasAccess performs bad for users
with complex group-policies, IMPALA-9242 will target to address this
problem.

This patch provides a config option 'num_check_authorization_threads' to
accelerate 'show tables/databases' by using multithreading. This configuration
is applicable only when authorization is enabled. A value of 1 disables
multi-threaded execution for checking access. However, a small value of larger
than 1 may limit the parallism of FE requests when checking authorization with
a high concurrency. The value must be in the range of 1 to 128. The default
value of 'num_check_access_threads' is 1.

Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
---
M be/src/common/global-flags.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/authorization/test_authorization.py
6 files changed, 144 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/14846/15
-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 15
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Zhou Xu (Code Review)" <ge...@cloudera.org>.
Zhou Xu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14846/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14846/7//COMMIT_MSG@15
PS7, Line 15: num_check_authorization_
> I would prefer a name the contains "authorization" or "auth". The current n
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 9
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Wed, 18 Dec 2019 07:40:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 16:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5370/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 16
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Fri, 27 Dec 2019 11:34:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Zhou Xu (Code Review)" <ge...@cloudera.org>.
Zhou Xu has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................

IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

If Sentry authorization is enabled, users with multi group-policies
will take time to get the result of 'show tables/databases'. It seems
that ResourceAuthorizationProvider.hasAccess performs bad for users
with complex group-policies, IMPALA-9242 will target to address this
problem.

This patch provides a config option 'num_check_authorization_threads' to
accelerate 'show tables/databases' by using multithreading. This configuration
is applicable only when authorization is enabled. A value of 1 disables
multi-threaded execution for checking access. However, a small value of larger
than 1 may limit the parallism of FE requests when checking authorization with
a high concurrency. The value must be in the range of 1 to 128. The default
value of 'num_check_access_threads' is 1.

Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
---
M be/src/common/global-flags.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/authorization/test_authorization.py
6 files changed, 143 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/14846/14
-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 14
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 13:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5332/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 13
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Tue, 24 Dec 2019 12:23:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14846/7/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/14846/7/fe/src/main/java/org/apache/impala/service/Frontend.java@859
PS7, Line 859: submit
> You are right. All the check tasks created concurrently by the users when e
I am ok with the static thread pool if we ensure that we do not regress too much for the default case. This could be done by Guava's MoreExecutors.sameThreadExecutor(), Impala already uses it:

https://github.com/apache/impala/blob/f2f348c0f93208a0f34c33b6a4dc82f4d9d4b290/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java#L141

This would mean that by default the tasks will be executed on the current thread, which should add minimal overhead to the current solution. Meanwhile bumping num_check_access_threads could help some users with SHOW DATABASES/TABLES slowness if the number of parallel calls is not too large.

If there are lot of parallel SHOW DATABASES/TABLES and they are also slow, then I think that there is not much we can do, the user should add more Coordinator nodes.



-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 7
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Tue, 17 Dec 2019 13:51:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14846/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14846/1//COMMIT_MSG@14
PS1, Line 14: This patch enables MT_DOP query option to accelerate 'show tables/databases'
Let's not overload mt_dop. mt_dop should only be about query execution, not metadata loading.

On a side node, I'm not sure whether a query option is the right place to control this. For other metadata operations we use fixed-size thread pools for the most part. The problem with having parallelism per-operation is that you don't have any control over the amount of load induced on the underlying metadata systems. There's also nothing to stop end-users from setting it to an absurd value.

Anyway, I'll let reviewers figure out what would be best there, my main point was please don't use mt_dop for this, we want to reserve that for the backend.



-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 1
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Dec 2019 17:08:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 15: Code-Review+2

Thanks for addressing all the comments! Carry on Vihang's +1.


-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 15
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Fri, 27 Dec 2019 11:34:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 2:

(9 comments)

I think the patch's approach makes sense in the short term. However, I think a better long term solution would be to only list privileges for a given Authorizable object for the user instead of listing all the privileges for the user. This causes the underlying checkAccess method in Sentry library to be highly inefficient. Currently, I see that the authorizables field is ignored in this method:

https://github.com/apache/sentry/blob/branch-2.1.0/sentry-provider/sentry-provider-cache/src/main/java/org/apache/sentry/provider/cache/SimpleCacheProviderBackend.java#L64

What this effectively means is that for each access each, instead of checking against the privileges for that object for the user, we are currently checking against all the privileges for the user. When you add fine-grained privileges it increases the number of privilege checks and I think this is the cause of slowdown. I created IMPALA-9242 which has more details.

http://gerrit.cloudera.org:8080/#/c/14846/2/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/14846/2/fe/src/main/java/org/apache/impala/service/Frontend.java@762
PS2, Line 762: private String dbName_;
             :     private String tblName_;
             :     private String owner_;
             :     private User user_;
change to final variables?


http://gerrit.cloudera.org:8080/#/c/14846/2/fe/src/main/java/org/apache/impala/service/Frontend.java@767
PS2, Line 767: CheckAccess
Can you add some documentation for this class. Specifically, which fields can be null or not. Also, may be add a Preconditions.checkNotNull for dbName, owner and user?


http://gerrit.cloudera.org:8080/#/c/14846/2/fe/src/main/java/org/apache/impala/service/Frontend.java@774
PS2, Line 774: public
nit, Adding @Override makes is more readable in my opinion.


http://gerrit.cloudera.org:8080/#/c/14846/2/fe/src/main/java/org/apache/impala/service/Frontend.java@797
PS2, Line 797: Remove corresponding elements according to pendingCheckingTasks
May be say something like "This method filters out elements from the given list based on the the results of the pendingCheckTasks? 

Also, can you rename pendingCheckingTasks to pendingCheckTasks in this file?


http://gerrit.cloudera.org:8080/#/c/14846/2/fe/src/main/java/org/apache/impala/service/Frontend.java@799
PS2, Line 799: void
Is this method package-private for a reason? If not, lets keep it a private method.


http://gerrit.cloudera.org:8080/#/c/14846/2/fe/src/main/java/org/apache/impala/service/Frontend.java@799
PS2, Line 799: filterUnaccessiableElements
nit, misspelled method name. Should be "filterUnaccessibleElements"


http://gerrit.cloudera.org:8080/#/c/14846/2/fe/src/main/java/org/apache/impala/service/Frontend.java@800
PS2, Line 800: list
may be use a better name?


http://gerrit.cloudera.org:8080/#/c/14846/2/fe/src/main/java/org/apache/impala/service/Frontend.java@803
PS2, Line 803: Iterator
Can we add a Preconditions check to make sure that the size of list and pendingCheckTasks is the same?


http://gerrit.cloudera.org:8080/#/c/14846/2/fe/src/main/java/org/apache/impala/service/Frontend.java@808
PS2, Line 808: iter.remove();
nit, if you want to avoid braces, may be inline this too. Else using braces is more consistent with the coding style conventions we use.



-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 2
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Fri, 13 Dec 2019 01:14:23 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Zhou Xu (Code Review)" <ge...@cloudera.org>.
Zhou Xu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14846/12/tests/authorization/test_authorization.py
File tests/authorization/test_authorization.py:

http://gerrit.cloudera.org:8080/#/c/14846/12/tests/authorization/test_authorization.py@674
PS12, Line 674: 
> flake8: E501 line too long (102 > 90 characters)
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 13
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Tue, 24 Dec 2019 11:53:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Zhou Xu (Code Review)" <ge...@cloudera.org>.
Zhou Xu has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................

IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

If Sentry authorization is enabled, users with multi group-policies
will take time to get the result of 'show tables/databases'. It seems
that ResourceAuthorizationProvider.hasAccess performs bad for users
with complex group-policies, IMPALA-9242 will target to address this
problem.

This patch provides a config option 'num_check_access_threads' to accelerate
'show tables/databases' by using multithreading. This configuration is
applicable only when authorization is enabled. A value of 1 disables
multi-threaded execution for checking access. However, a small value of larger
than 1 may limit the parallism of FE requests when checking authorization with
a high concurrency. The value must be in the range of 1 to 128. The default
value of 'num_check_access_threads' is 1.

Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
---
M be/src/common/global-flags.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/authorization/test_authorization.py
6 files changed, 143 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/14846/13
-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 13
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 16: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 16
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Fri, 27 Dec 2019 11:34:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5306/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 9
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Wed, 18 Dec 2019 07:48:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Zhou Xu (Code Review)" <ge...@cloudera.org>.
Zhou Xu has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................

IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

If Sentry authorization is enabled, users with multi group-policies
will take time to get the result of 'show tables/databases'. It seems
that ResourceAuthorizationProvider.hasAccess performs bad for users
with complex group-policies, IMPALA-9242 will target to address this
problem.

This patch provides a config option 'num_check_authorization_threads' to
accelerate 'show tables/databases' by using multithreading. This configuration
is applicable only when authorization is enabled. A value of 1 disables
multi-threaded execution for checking authorization. The value must be in the
range of 1 to 32. The default value of 'num_check_authorization_threads' is 1.

Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
---
M be/src/common/global-flags.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
5 files changed, 121 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/14846/9
-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 9
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Zhou Xu (Code Review)" <ge...@cloudera.org>.
Zhou Xu has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................

IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

If Sentry authorization is enabled, users with multi group-policies
will take time to get the result of 'show tables/databases'. It seems
that ResourceAuthorizationProvider.hasAccess performs bad for users
with complex group-policies, IMPALA-9242 will target to address this
problem.

This patch provides a config option 'num_check_authorization_threads' to
accelerate 'show tables/databases' by using multithreading. This configuration
is applicable only when authorization is enabled. A value of 1 disables
multi-threaded execution for checking authorization. The value must be in the
range of 1 to 32. The default value of 'num_check_authorization_threads' is 1.

Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
---
M be/src/common/global-flags.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/authorization/test_authorization.py
6 files changed, 140 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/14846/10
-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 10
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14846/12/tests/authorization/test_authorization.py
File tests/authorization/test_authorization.py:

http://gerrit.cloudera.org:8080/#/c/14846/12/tests/authorization/test_authorization.py@674
PS12, Line 674: S
flake8: E501 line too long (102 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 12
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Tue, 24 Dec 2019 11:32:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/14846/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14846/1//COMMIT_MSG@14
PS1, Line 14: This patch enables MT_DOP query option to accelerate 'show tables/databases'
> Let's not overload mt_dop. mt_dop should only be about query execution, not
+1 to this comment above. Lets not use mt_dop for this.


http://gerrit.cloudera.org:8080/#/c/14846/1/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/14846/1/fe/src/main/java/org/apache/impala/service/Frontend.java@a778
PS1, Line 778: 
             : 
             : 
Looks like this check is being removed in the code. Is that intentional?


http://gerrit.cloudera.org:8080/#/c/14846/1/fe/src/main/java/org/apache/impala/service/Frontend.java@799
PS1, Line 799:  poolSize = mtDop > 0 ? mtDop
Instead of using mt_dop to get the poolsize, I think it would be good to have a separate config to determine the pool size. Also, since getDbs and getTableNames calls are so frequent (I have seen systems where you can have 10-20 of such calls in parallel at a time), may be it would be good to use a single static thread pool which can accept such tasks. Otherwise, in the the example above, we may have 10*10 threads doing the checkAccess calls if the pool size is 10.


http://gerrit.cloudera.org:8080/#/c/14846/1/fe/src/main/java/org/apache/impala/service/Frontend.java@977
PS1, Line 977: 
             :         // Wait for the db checking tasks to be finished.
             :         int failedCheckTasks = 0;
             :         int index = 0;
             :         iter = dbs.iterator();
             :         while (iter.hasNext()) {
             :           iter.next();
             :           try {
             :             if (!pendingCheckingTasks.get(index).get())
             :               iter.remove();
             :             index ++;
             :           } catch (ExecutionException | InterruptedException e) {
             :             failedCheckTasks ++;
             :             LOG.error("Encountered an error checking access for dbs", e);
             :           }
             :         }
It looks like a lot of this code is duplicated. Can we refactor this into a method which can be called from both getTableNames and getDbs?


http://gerrit.cloudera.org:8080/#/c/14846/1/fe/src/main/java/org/apache/impala/service/Frontend.java@1032
PS1, Line 1032: hrow new IllegalArgumentException("Params dbName should not be null.")
Instead of this, may be its better to have a Preconditions.checkNotNull(dbName) at line 1010.



-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 1
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Wed, 11 Dec 2019 00:50:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5275/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 4
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Fri, 13 Dec 2019 07:50:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5290/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 7
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Mon, 16 Dec 2019 08:31:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14846/6/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/14846/6/be/src/common/global-flags.cc@299
PS6, Line 299:     "tables/databases. This configuration is applicable only when authorization is enabled."
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/14846/6/be/src/common/global-flags.cc@300
PS6, Line 300:     "A value of 1 disables multi-threaded execution for checking access. The value must be "
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 6
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Mon, 16 Dec 2019 07:47:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Zhou Xu (Code Review)" <ge...@cloudera.org>.
Zhou Xu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14846/6/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/14846/6/be/src/common/global-flags.cc@299
PS6, Line 299:     "tables/databases. This configuration is applicable only when authorization is "
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/14846/6/be/src/common/global-flags.cc@300
PS6, Line 300:     "enabled. A value of 1 disables multi-threaded execution for checking access. "
> line too long (92 > 90)
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 7
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Mon, 16 Dec 2019 08:02:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 11:

(8 comments)

This patch looks good to me! Leave some comments about error handling and nits.

http://gerrit.cloudera.org:8080/#/c/14846/11/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/14846/11/be/src/common/global-flags.cc@306
PS11, Line 306: A value of 1 disables multi-threaded execution for checking authorization.
It may worth to mention that when set larger than 1, the parallism of FE requests may be limit by this. So the admin won't set this to 2 or 3 hoping SHOW statements get 2-3 times speed-up.


http://gerrit.cloudera.org:8080/#/c/14846/11/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/14846/11/fe/src/main/java/org/apache/impala/service/Frontend.java@202
PS11, Line 202: 32
I think 32 is not large enough since the default value of fe_service_threads is 64. So there're at most 128 parallel requests (64 for beeswax, 64 for hs2). Maybe 128 is more reasonable. Looks like no harm to set it larger.


http://gerrit.cloudera.org:8080/#/c/14846/11/fe/src/main/java/org/apache/impala/service/Frontend.java@806
PS11, Line 806:     /* dbName and user cannot be null, tblName and owner can be null. */
Method comments start by '/**', end by '*/' and explain what the method does. This seems not a method comment. Maybe put it inside the method?


http://gerrit.cloudera.org:8080/#/c/14846/11/fe/src/main/java/org/apache/impala/service/Frontend.java@854
PS11, Line 854:  
nit: don't need space


http://gerrit.cloudera.org:8080/#/c/14846/11/fe/src/main/java/org/apache/impala/service/Frontend.java@856
PS11, Line 856:  
nit: don't need space


http://gerrit.cloudera.org:8080/#/c/14846/11/fe/src/main/java/org/apache/impala/service/Frontend.java@857
PS11, Line 857:  
nit: don't need space


http://gerrit.cloudera.org:8080/#/c/14846/11/fe/src/main/java/org/apache/impala/service/Frontend.java@857
PS11, Line 857:         LOG.error("Encountered an error checking access" , e);
Should we break the loop in this case?


http://gerrit.cloudera.org:8080/#/c/14846/11/tests/authorization/test_authorization.py
File tests/authorization/test_authorization.py:

http://gerrit.cloudera.org:8080/#/c/14846/11/tests/authorization/test_authorization.py@674
PS11, Line 674: test_sentry_show_stmts_with_any
It'd be better to use another log folder. This is used in a previous test.



-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 11
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Tue, 24 Dec 2019 06:44:22 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Zhou Xu (Code Review)" <ge...@cloudera.org>.
Zhou Xu has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................

IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

If Sentry authorization is enabled, users with multi group-policies
will take time to get the result of 'show tables/databases'. It seems
that ResourceAuthorizationProvider.hasAccess performs bad for users
with complex group-policies, IMPALA-9242 will target to address this
problem.

This patch provides a config option 'num_check_access_threads' to accelerate
'show tables/databases' by using multithreading. The default value of
'num_check_access_threads' is 16.

Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
---
M be/src/common/global-flags.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
5 files changed, 106 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/14846/4
-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 4
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 4:

(5 comments)

Thanks for making the suggested changes. Left some comments related to initialization logic and default value of the config.

http://gerrit.cloudera.org:8080/#/c/14846/4/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/14846/4/be/src/common/global-flags.cc@297
PS4, Line 297: 16
Do you think we should disable this by default? Since this patch only applies when authorization is enabled, turning this on by default may not be the best use of resources for all the users. Also, the slow access check doesn't affect Ranger based authorization.


http://gerrit.cloudera.org:8080/#/c/14846/4/be/src/common/global-flags.cc@299
PS4, Line 299: authentication
Can you some more description here related to acceptable values? We also don't want the users to set any unreasonable values (like negative, 0 or too large a number).

May be add something like: 
The number of threads used to check access for the user when executing show tables/databases. This configuration is applicable only when authorization is enabled. A value of 0 or 1 disables multi-threaded execution for checking access. The value must be in the range of 0 to 32.


http://gerrit.cloudera.org:8080/#/c/14846/4/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/14846/4/fe/src/main/java/org/apache/impala/service/Frontend.java@273
PS4, Line 273: ExecutorService
this can be final


http://gerrit.cloudera.org:8080/#/c/14846/4/fe/src/main/java/org/apache/impala/service/Frontend.java@274
PS4, Line 274: BackendConfig.INSTANCE.getNumCheckAccessThreads()
Can you initialize this using a private static method instead? That way you can validate the configuration value like say Preconditions.checkState(numThreads >= 0 && numThreads < MAX_CHECK_ACCESS_POOL_SIZE);

Also, currently the threadpool will be created no matter if authorization is enabled or not. So may be we should only initialize the pool when authorization is enabled.


http://gerrit.cloudera.org:8080/#/c/14846/4/fe/src/main/java/org/apache/impala/service/Frontend.java@808
PS4, Line 808: checkSet
nit, may be rename to checkList, since its a list not a set.



-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 4
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Fri, 13 Dec 2019 19:53:23 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 7:

(3 comments)

The implementation looks good to me, but I am concerned about possible performance impact, see my comment in Frontend.java

http://gerrit.cloudera.org:8080/#/c/14846/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14846/7//COMMIT_MSG@15
PS7, Line 15: num_check_access_threads
I would prefer a name the contains "authorization" or "auth". The current name could mean many things.


http://gerrit.cloudera.org:8080/#/c/14846/7//COMMIT_MSG@18
PS7, Line 18: range
            : of 1 to 32. The default value of 'num_check_access_threads' is 1.
nit: wrap at 72


http://gerrit.cloudera.org:8080/#/c/14846/7/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/14846/7/fe/src/main/java/org/apache/impala/service/Frontend.java@859
PS7, Line 859: submit
I am not familiar with ExecutorService, so I may be wrong here.

What happens if SHOW DATABASES is called by several users concurrently? Does MAX_CHECK_ACCESS_POOL_SIZE maximize the number of threads that can work in parallel, or the current thread (where doGetTableNames is called) can also do some work?

If no more than MAX_CHECK_ACCESS_POOL_SIZE threads can work at the same time and the default value 1 is used, then this change serializes parallel SHOW DATABASES calls which is a regression compared to the current state.

I can imagine that in certain cases bumping MAX_CHECK_ACCESS_POOL_SIZE can help a lot, but we should ensure that the default 1 does not cause significant regression.

related stackoverflow question: https://stackoverflow.com/questions/6581188/is-there-an-executorservice-that-uses-the-current-thread



-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 7
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Mon, 16 Dec 2019 13:14:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14846/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14846/13//COMMIT_MSG@15
PS13, Line 15: num_check_access_threads
The codes still use the old name. BTW, What about "num_authz_checking_threads" (Number of authorization checking threads) ? So it looks like other existing flags like "num_metadata_loading_threads".



-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 13
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Wed, 25 Dec 2019 02:26:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 15:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5340/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 15
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Fri, 27 Dec 2019 07:22:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14846/11/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/14846/11/fe/src/main/java/org/apache/impala/service/Frontend.java@857
PS11, Line 857:         LOG.error("Encountered an error checking access", e);
> Continuing the loop when it comes into an exception can help get all the ex
I think they are probably the same kind of exception so don't need to log them all. Most importantly, before this patch, we stop when the first exception is thrown. After this patch, we have to iterate all the candidates and wait until they all fail, which could take time if the exception is some sort of connection timeout errors. I think we should act as the current behavior. Thoughts?



-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 14
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Wed, 25 Dec 2019 13:30:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14846/1/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/14846/1/fe/src/main/java/org/apache/impala/service/Frontend.java@835
PS1, Line 835:             LOG.error("Encountered an error checking access for tables in db: " +dbName, e);
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/14846/1/fe/src/main/java/org/apache/impala/service/MetadataOp.java
File fe/src/main/java/org/apache/impala/service/MetadataOp.java:

http://gerrit.cloudera.org:8080/#/c/14846/1/fe/src/main/java/org/apache/impala/service/MetadataOp.java@364
PS1, Line 364:         for (String tabName: fe.getTableNames(db.getName(), tablePatternMatcher, user, 0)) {
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/14846/1/fe/src/test/java/org/apache/impala/authorization/AuthorizationTest.java
File fe/src/test/java/org/apache/impala/authorization/AuthorizationTest.java:

http://gerrit.cloudera.org:8080/#/c/14846/1/fe/src/test/java/org/apache/impala/authorization/AuthorizationTest.java@239
PS1, Line 239:     dbs = AUTHZ_FE.getDbs(PatternMatcher.createHivePatternMatcher("functional_rc"), USER, 0);
line too long (93 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 1
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Dec 2019 06:24:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 14:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5333/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 14
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Wed, 25 Dec 2019 03:35:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Zhou Xu (Code Review)" <ge...@cloudera.org>.
Zhou Xu has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................

IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

If Sentry authorization is enabled, users with multi group-policies
will take time to get the result of 'show tables/databases'. It seems
that ResourceAuthorizationProvider.hasAccess performs bad for users
with complex group-policies, IMPALA-9242 will target to address this
problem.

This patch provides a config option 'num_check_access_threads' to accelerate
'show tables/databases' by using multithreading. This configuration is
applicable only when authorization is enabled. A value of 1 disables
multi-threaded execution for checking access. The value must be in the range
of 1 to 32. The default value of 'num_check_access_threads' is 1.

Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
---
M be/src/common/global-flags.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
5 files changed, 115 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/14846/7
-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 7
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5207/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 1
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Dec 2019 06:53:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5326/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 11
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Fri, 20 Dec 2019 11:53:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................


Patch Set 16: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 16
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>
Gerrit-Comment-Date: Fri, 27 Dec 2019 16:09:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

Posted by "Zhou Xu (Code Review)" <ge...@cloudera.org>.
Zhou Xu has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/14846 )

Change subject: IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'
......................................................................

IMPALA-9195: Using multithreaded execution to accelerate 'show tables/databases'

If Sentry authorization is enabled, users with multi group-policies
will take time to get the result of 'show tables/databases'. It seems
that ResourceAuthorizationProvider.hasAccess performs bad for users
with complex group-policies, IMPALA-9242 will target to address this
problem.

This patch provides a config option 'num_check_authorization_threads' to
accelerate 'show tables/databases' by using multithreading. This configuration
is applicable only when authorization is enabled. A value of 1 disables
multi-threaded execution for checking authorization. The value must be in the
range of 1 to 32. The default value of 'num_check_authorization_threads' is 1.

Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
---
M be/src/common/global-flags.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M tests/authorization/test_authorization.py
6 files changed, 140 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/14846/11
-- 
To view, visit http://gerrit.cloudera.org:8080/14846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I860e0d18afa0421665f8b3b1c5561d6bdacc5e96
Gerrit-Change-Number: 14846
Gerrit-PatchSet: 11
Gerrit-Owner: Zhou Xu <pe...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zhou Xu <pe...@gmail.com>