You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org> on 2016/06/14 12:03:22 UTC

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Bharath Vissapragada has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3381

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................

IMPALA-2347: Reuse metastore client connections in Catalog

Currently we create a new connection to metastore everytime Catalog
connects to HMS. This was intentionally done to circumvent HIVE-5181.
Given it is fixed already in Hive, this patch intends to refactor the
HMS client usage on the catalog to reuse the connections. Additionally
this patch increases the default intial metastore pool size to 10 from
a previous value of 5, which is less for even for decent DDL load.

Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
---
M fe/src/main/java/com/cloudera/impala/catalog/Catalog.java
M fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java
2 files changed, 2 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/81/3381/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 5:

Yeah, I was thinking about that too Bharath. Can you change the behavior to enforce a maximum pool size?

-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 8:

Ran the JMeter tests again on the latest patch. The throughput was around 267/minute which is roughly same as before. Looks like there is no regression atleast. Please let me know if you want me to run any specific tests.

-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has uploaded a new patch set (#5).

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................

IMPALA-2347: Reuse metastore client connections in Catalog

Currently we create a new connection to metastore every time Catalog
connects to HMS. This was intentionally done to circumvent HIVE-5181.
Given it is fixed already in Hive, this patch intends to refactor the
HMS client usage on the catalog to reuse the connections. Additionally
this patch makes MetaStoreClient implement AutoCloseable interface and
hence all the callers can use the try-with-resources to create a new
metastore client and needn't explicitly call release(). Also, this
patch increases the default initial metastore pool size to 10 from a
previous value of 5, which is less even for a decent DDL load.

Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
---
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/Catalog.java
M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java
M fe/src/main/java/com/cloudera/impala/catalog/TableLoader.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
M fe/src/test/java/com/cloudera/impala/catalog/CatalogTest.java
8 files changed, 68 insertions(+), 151 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/81/3381/5
-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 5:

@Dimitris: Do we need to set a maximum size for the connection pool (beyond which we close the connection directly instead of offering to the pool)? Else a sudden burst of DDL load can create a big pool which remains unused for rest of Catalog's lifetime.

-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 8:

@Dimitris: The latest patch just passed a core build. So its ready for review. 
http://sandbox.jenkins.cloudera.com/job/impala-private-build-and-test/3686/

-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has uploaded a new patch set (#7).

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................

IMPALA-2347: Reuse metastore client connections in Catalog

Currently we create a new connection to metastore every time Catalog
connects to HMS. This was intentionally done to circumvent HIVE-5181.
Given it is fixed already in Hive, this patch intends to refactor the
HMS client usage on the catalog to reuse the connections. Additionally
this patch makes MetaStoreClient implement AutoCloseable interface and
hence all the callers can use the try-with-resources to create a new
metastore client and needn't explicitly call release(). Also, this
patch increases the default initial metastore pool size to 10 from a
previous value of 5, which is less even for a decent DDL load.

Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
---
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/Catalog.java
M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java
M fe/src/main/java/com/cloudera/impala/catalog/TableLoader.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
M fe/src/test/java/com/cloudera/impala/catalog/CatalogTest.java
8 files changed, 73 insertions(+), 153 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/81/3381/7
-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has uploaded a new patch set (#3).

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................

IMPALA-2347: Reuse metastore client connections in Catalog

Currently we create a new connection to metastore everytime Catalog
connects to HMS. This was intentionally done to circumvent HIVE-5181.
Given it is fixed already in Hive, this patch intends to refactor the
HMS client usage on the catalog to reuse the connections. Additionally
this patch increases the default initial metastore pool size to 10 from
a previous value of 5, which is less even for a decent DDL load.

Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
---
M fe/src/main/java/com/cloudera/impala/catalog/Catalog.java
M fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java
2 files changed, 2 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/81/3381/3
-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3381/2/fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java
File fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java:

Line 76:     @Override
> It could be due to two things, 
oh.. then that is ok then.. if we continue creating new connections then there will not be any short of connections.. I was just worried about if we stop creating new connections before we fix all connection leak then catalog may be stuck..


-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 8:

Dimitris, here is the patch that switches to  RetryingMetaStoreClient. The above test that deterministically failed not passes. I'm running an exhaustive build just to make sure nothing else broke. Feel free to take a look meanwhile.

-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 7:

(1) This option seems to be the default Hive has been using since CDH 5.0 [a,b]. I see there are a few open issues [c], but I don't see anything too serious  and given it has been in use for quite a while, I guess its worth a try.

(2) I'll attempt a patch with this change and may be we can stress test it and see if we hit any issues. Also I'll run the Jmeter tests on it.

(3) If (2) doesn't work out for some reason, I'll attempt an implementation with our own retry mechanism.

[a] https://github.com/cloudera/hive/commit/42b8eba0039d8701e2d6ed3496418c640b753260
[b] https://issues.apache.org/jira/browse/HIVE-3400
[c] https://issues.apache.org/jira/browse/HIVE-13014?jql=status%20%3D%20Open%20AND%20text%20~%20%22%5C%22RetryingMetaStoreClient%5C%22%22

-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3381/3//COMMIT_MSG
Commit Message:

PS3, Line 13: 10
You may want to set the load_catalog_in_background to true and measure the time to do a full metadata load before and after this change. Ideally, we should also use something like Jmeter to measure throughput/latency for DDL operations and see if this change has any effect.


http://gerrit.cloudera.org:8080/#/c/3381/3/fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java
File fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java:

PS3, Line 50: MetaStoreClient
If you're up to it it would be nice if this class implemented the AutoCloseable interface so that then we could use try-with-resources statement. Right now we need to explicitly call release on the finally block.


-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/3381/4//COMMIT_MSG
Commit Message:

PS4, Line 9: everytime
every time


http://gerrit.cloudera.org:8080/#/c/3381/4/fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java
File fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java:

PS4, Line 57: this.
Remove here and everywhere else. Not needed...


PS4, Line 85: if (hiveClient_ != null)
Why is this check needed? Who can set hiveClient_ to null?


Line 94:       isInUse_ = true;
Maybe we should also add a Preconditions.checkState(!isInUse_); here


-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3381/3//COMMIT_MSG
Commit Message:

PS3, Line 13: 10
> You may want to set the load_catalog_in_background to true and measure the 
In my local setting I don't see much difference in the startup time. One thing I noticed though is that there were >1000 thread pool hits (instead of creating a new connection). I think its much more effective in larger catalogs and kerberized environments where a new connection to the metastore is costly and involves a roundtrip to the KDC, although I don't have the numbers to back this up. As discussed in the chat, please send me that zip so that I can profile it further (probably on a per connection basis).


http://gerrit.cloudera.org:8080/#/c/3381/3/fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java
File fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java:

PS3, Line 50: MetaStoreClient
> If you're up to it it would be nice if this class implemented the AutoClose
Nice idea. Done.


-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has uploaded a new patch set (#2).

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................

IMPALA-2347: Reuse metastore client connections in Catalog

Currently we create a new connection to metastore everytime Catalog
connects to HMS. This was intentionally done to circumvent HIVE-5181.
Given it is fixed already in Hive, this patch intends to refactor the
HMS client usage on the catalog to reuse the connections. Additionally
this patch increases the default intial metastore pool size to 10 from
a previous value of 5, which is less even for a decent DDL load.

Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
---
M fe/src/main/java/com/cloudera/impala/catalog/Catalog.java
M fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java
2 files changed, 2 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/81/3381/2
-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 8:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/3381/8/fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java
File fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java:

PS8, Line 31: RetryingMetaStoreClient
> what does it retry? Are operations idempotent under retry?
- The retries are on individual metadata ops implemented by IMetastoreClient interface (for ex: getDbs(), alterTable()...). More details at [1].
- I'm not sure if it is totally idempotent, say in edge cases like network issues, where the HMS has committed the transaction but the Client doesn't get the confirmation. I think it would still retry and see a different exception. But I think the Catalog code is designed to handle these cases well. Not sure if this answers your question, please let me know if you are looking for any specific details.

[1] https://github.com/cloudera/hive/blob/cdh5-1.1.0_5.7.0/metastore/src/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java#L90


PS8, Line 32: There is no size limit.
> Can you clarify this comment now? There is a size limit to the number of ca
Updated the comment. Not relevant anymore.


PS8, Line 39: threadpool
> threadpool??
Updated.


Line 54:   private HiveMetaHookLoader dummyHookLoader = new HiveMetaHookLoader() {
> make it final?
Done


Line 54:   private HiveMetaHookLoader dummyHookLoader = new HiveMetaHookLoader() {
> and static?
Done


PS8, Line 98: exceed
> exceeds
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 7:

I agree that option #3 seems like the best option, but we need to answer a few questions first:
1. How stable is that implementation? Is it currently used in production and since when?
2. Are there any performance implications if we switch to the retrying metastore client? We probably need to repeat the perf evaluation you did using JMeter. It wouldn't be a bad idea to share the perf methodology and results with Mostafa. He may have more interesting use cases/workloads that are worth testing.
3. Another option, in case we don't feel comfortable with the existing implementation of the retry client, is to create our own retry logic. Not ideal but just something to consider.

-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Huaisi Xu (Code Review)" <ge...@cloudera.org>.
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3381/2//COMMIT_MSG
Commit Message:

Line 13: this patch increases the default intial metastore pool size to 10 from
typo: initial. Why did you decide to increase this? I have seen connection leak (current connection is >hundreds). does it mean that we never release those connection? if we still have that problem then probably catalog will get stuck as the release is never called. ?


http://gerrit.cloudera.org:8080/#/c/3381/2/fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java
File fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java:

Line 76:     public void release() {
Some time I saw connection to be in the hundreds which means this may not be called at all. so we may fail to offer this client back and catalog may stave on client.


-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 9:

Henry, ping.

-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 4:

The profiling and results look reasonable to me. At this point I don't think we need to spend any more time tuning this. Thanks for looking into it.

-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has uploaded a new patch set (#4).

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................

IMPALA-2347: Reuse metastore client connections in Catalog

Currently we create a new connection to metastore everytime Catalog
connects to HMS. This was intentionally done to circumvent HIVE-5181.
Given it is fixed already in Hive, this patch intends to refactor the
HMS client usage on the catalog to reuse the connections. Additionally
this patch makes MetaStoreClient implement AutoCloseable interface and
hence all the callers can use the try-with-resources to create a new
metastore client and needn't explicitly call release(). Also, this
patch increases the default initial metastore pool size to 10 from a
previous value of 5, which is less even for a decent DDL load.

Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
---
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/Catalog.java
M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java
M fe/src/main/java/com/cloudera/impala/catalog/TableLoader.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
M fe/src/test/java/com/cloudera/impala/catalog/CatalogTest.java
8 files changed, 65 insertions(+), 149 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/81/3381/4
-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has uploaded a new patch set (#6).

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................

IMPALA-2347: Reuse metastore client connections in Catalog

Currently we create a new connection to metastore every time Catalog
connects to HMS. This was intentionally done to circumvent HIVE-5181.
Given it is fixed already in Hive, this patch intends to refactor the
HMS client usage on the catalog to reuse the connections. Additionally
this patch makes MetaStoreClient implement AutoCloseable interface and
hence all the callers can use the try-with-resources to create a new
metastore client and needn't explicitly call release(). Also, this
patch increases the default initial metastore pool size to 10 from a
previous value of 5, which is less even for a decent DDL load.

Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
---
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/Catalog.java
M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java
M fe/src/main/java/com/cloudera/impala/catalog/TableLoader.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
M fe/src/test/java/com/cloudera/impala/catalog/CatalogTest.java
8 files changed, 71 insertions(+), 152 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/81/3381/6
-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3381/6/fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java
File fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java:

PS6, Line 35: 32
> Out of curiosity, do we know if HMS has a limit on the number of open conne
This seems to be controlled by two params "hive.metastore.server.min.threads" (defaults to 200) and "hive.metastore.server.max.threads" (defaults to 100000), which imo are very large.


Line 82:       // Ensure the connection isn't returned to the pool if the pool has been closed.
> "or if the number of connections in the pool exceed MAX_HMS_CONNECTION_POOL
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/3381/4//COMMIT_MSG
Commit Message:

PS4, Line 9: everytime
> every time
Done


http://gerrit.cloudera.org:8080/#/c/3381/4/fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java
File fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java:

PS4, Line 57: this.
> Remove here and everywhere else. Not needed...
Done


PS4, Line 85: if (hiveClient_ != null)
> Why is this check needed? Who can set hiveClient_ to null?
Just added to be on safe side incase someone inadvertently  sets to null. Removed.


Line 94:       isInUse_ = true;
> Maybe we should also add a Preconditions.checkState(!isInUse_); here
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 7: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3381/4/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
File fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java:

PS4, Line 539: 
This seems to be a connection leak. We are not closing this anywhere.


-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3381

to look at the new patch set (#9).

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................

IMPALA-2347: Reuse metastore client connections in Catalog

Currently we create a new connection to metastore every time Catalog
connects to HMS. This was intentionally done to circumvent HIVE-5181.
Given it is fixed already in Hive, this patch intends to refactor the
HMS client usage on the catalog to reuse the connections. Additionally
this patch makes MetaStoreClient implement AutoCloseable interface and
hence all the callers can use the try-with-resources to create a new
metastore client and needn't explicitly call release(). Also, this
patch increases the default initial metastore pool size to 10 from a
previous value of 5, which is less even for a decent DDL load.

In terms of design, this patch switches the metastore client
implementation to RetryingMetaStoreClient from previous implementation
of HiveMetaStoreClient. The reason for this switch is to handle HMS
failures from Catalog side where the entire metastore client pool
cache becomes stale in the event of a metastore restart and there is
no proper way to deal with it. RetryingMetaStoreClient has inbuilt
retry mechanism which reconnects stale connections in the event of
failures. For more details on retries and corresponding configurations,
check org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.

Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
---
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/Catalog.java
M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/DataSourceTable.java
M fe/src/main/java/com/cloudera/impala/catalog/HBaseTable.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M fe/src/main/java/com/cloudera/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/IncompleteTable.java
M fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java
M fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java
M fe/src/main/java/com/cloudera/impala/catalog/Table.java
M fe/src/main/java/com/cloudera/impala/catalog/TableLoader.java
M fe/src/main/java/com/cloudera/impala/catalog/View.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
M fe/src/main/java/com/cloudera/impala/util/MetaStoreUtil.java
M fe/src/test/java/com/cloudera/impala/catalog/CatalogTest.java
16 files changed, 123 insertions(+), 186 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/81/3381/9
-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 8: Code-Review+1

(1 comment)

Nice!

http://gerrit.cloudera.org:8080/#/c/3381/8/fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java
File fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java:

Line 54:   private HiveMetaHookLoader dummyHookLoader = new HiveMetaHookLoader() {
make it final?


-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 8:

(5 comments)

Looks pretty good.

http://gerrit.cloudera.org:8080/#/c/3381/8/fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java
File fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java:

PS8, Line 31: RetryingMetaStoreClient
what does it retry? Are operations idempotent under retry?


PS8, Line 32: There is no size limit.
Can you clarify this comment now? There is a size limit to the number of cached connections.


PS8, Line 39: threadpool
threadpool??


Line 54:   private HiveMetaHookLoader dummyHookLoader = new HiveMetaHookLoader() {
> make it final?
and static?


PS8, Line 98: exceed
exceeds


-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 4:

Did some basic profiling with Jmeter on my local machine. The test runs 200 parallel create table actions. Firstly with no. of initial threads set to 5 has throughput of 162/minute and with set to 10, its 252/minute

http://www.dumpt.com/img/viewer.php?file=ktyu6271asjnnnahr512.png (with 5)
http://www.dumpt.com/img/viewer.php?file=rhx0syhkpw1tbynksvsm.png (with 10)

Now I benchmarked with and without the patch, the throughput increased by ~8 queries/minute

http://www.dumpt.com/img/viewer.php?file=rsa6p6mwhng5i3svtcji.png (without patch)
http://www.dumpt.com/img/viewer.php?file=9lgwf5rfxu052crp8zzt.png (with patch)

I think this still requires more finer profiling. Basically in the above setup, Impala limits the no. queries admitted by impalad to 64 at any point (--fe_service_threads) and I think tuning this can give even better results. (or basically profile it by firing to a load balancer, so that all the queries aren't fired to the same impalad)

-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3381

to look at the new patch set (#8).

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................

IMPALA-2347: Reuse metastore client connections in Catalog

Currently we create a new connection to metastore every time Catalog
connects to HMS. This was intentionally done to circumvent HIVE-5181.
Given it is fixed already in Hive, this patch intends to refactor the
HMS client usage on the catalog to reuse the connections. Additionally
this patch makes MetaStoreClient implement AutoCloseable interface and
hence all the callers can use the try-with-resources to create a new
metastore client and needn't explicitly call release(). Also, this
patch increases the default initial metastore pool size to 10 from a
previous value of 5, which is less even for a decent DDL load.

In terms of design, this patch switches the metastore client
implementation to RetryingMetaStoreClient from previous implementation
of HiveMetaStoreClient. The reason for this switch is to handle HMS
failures from Catalog side where the entire metastore client pool
cache becomes stale in the event of a metastore restart and there is
no proper way to deal with it. RetryingMetaStoreClient has inbuilt
retry mechanism which reconnects stale connections in the event of
failures. For more details on retries and corresponding configurations,
check org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.

Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
---
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/Catalog.java
M fe/src/main/java/com/cloudera/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/DataSourceTable.java
M fe/src/main/java/com/cloudera/impala/catalog/HBaseTable.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
M fe/src/main/java/com/cloudera/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/com/cloudera/impala/catalog/IncompleteTable.java
M fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java
M fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java
M fe/src/main/java/com/cloudera/impala/catalog/Table.java
M fe/src/main/java/com/cloudera/impala/catalog/TableLoader.java
M fe/src/main/java/com/cloudera/impala/catalog/View.java
M fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java
M fe/src/main/java/com/cloudera/impala/util/MetaStoreUtil.java
M fe/src/test/java/com/cloudera/impala/catalog/CatalogTest.java
16 files changed, 119 insertions(+), 184 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/81/3381/8
-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has abandoned this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Abandoned

Thanks for the reviews. Moving it to asf-cr

-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3381/6/fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java
File fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java:

PS6, Line 35: 32
Out of curiosity, do we know if HMS has a limit on the number of open connections? If there is, we need to know how close 32 is to that number. We don't want to starve other operations that talk to HMS.


Line 82:       // Ensure the connection isn't returned to the pool if the pool has been closed.
"or if the number of connections in the pool exceed MAX_HMS_CONNECTION_POOL_SIZE."


-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 9: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 7:

Sounds like a good plan :)

-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3381/2//COMMIT_MSG
Commit Message:

Line 13: this patch increases the default intial metastore pool size to 10 from
> typo: initial. Why did you decide to increase this? I have seen connection 
- Like I mentioned in the other comment, earlier every call to a getClient() will create a new MetastoreClient. This patch can ease the behavior a little by re-using the connections as the release() call now puts them in the pool and further getClients() will reuse these connections. 

- Regarding the increase in  initial pool size, I did so as I feel it is very less after looking at how often we use getClient() calls (CatalogOp calls).


http://gerrit.cloudera.org:8080/#/c/3381/2/fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java
File fe/src/main/java/com/cloudera/impala/catalog/MetaStoreClientPool.java:

Line 76:     public void release() {
> Some time I saw connection to be in the hundreds which means this may not b
It could be due to two things, 

1) Like you mentioned, we may not be calling release() properly which is a leak, (please raise a jira if you see code where release() isn't being called.)

2) We don't set a limit on the max no. of connections.  If you see the behavior prior to this patch, we just create a new MetaStoreClient() for every getClient() call. This under heavy load can appear as a lot of connections.


-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-2347: Reuse metastore client connections in Catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-2347: Reuse metastore client connections in Catalog
......................................................................


Patch Set 4:

Dimitris, there is an edge case for this implementation. Check the test added in this commit [1]. That consistently fails with this patch. Reason being, once the HMS is restarted, all the connections in the thread pool become stale and any further operations on them return TTransportException. Although, HMSClient class exposes reconnect() call, there is no way to consistently find out if the other end of the connection actually died (even with TCP keep-alive sockets). So there should be some kind of retry logic from the client side. I can think of three ways. (We didn't hit this prior to the patch because we do a new HMSClient() every-time we call getClient(), which basically makes a connection to the metastore post restart)

1) Call reconnect() every time we poll() a client from the thread pool. This basically defeats the purpose of the patch. 

2) Have a Ping() operation for HMS in MetaStoreClientPool and call reconnect() if it fails. Even this has an overhead depending on how costly the Ping() method is. 

3) Switch to RetryingMetaStoreClient [2] which has an inbuilt retry mechanism on a per metadata call basis. Basically it uses a base HMSClient class and invokes the method with multiple retries and calls reconnect() between retries

I feel the switch to (3) is the best way forward for a long term fix which also adds per operation retries to Impala as well (which can be configured from hive-site.xml settings). Do you see any downside for this? Is there any other way to workaround this problem?

[1] https://github.com/cloudera/Impala/commit/b3c0d57ab1a4da92aadd07c48b09bc989106ddb4
[2] https://github.com/cloudera/hive/blob/cdh5-1.1.0_5.5.0/metastore/src/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java#L81

-- 
To view, visit http://gerrit.cloudera.org:8080/3381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I517c0e1efef2584cd8d34017b33574f2ad69bd52
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Huaisi Xu <hx...@cloudera.com>
Gerrit-HasComments: No