You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org> on 2017/01/13 18:13:10 UTC

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Dimitris Tsirogiannis has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5710

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................

IMPALA-4449: Revisit table locking pattern in the catalog

This commit fixes an issue where multiple long-running operations on the
same catalog object (e.g. table) can block other catalog operations from
making progress.

Problem:
IMPALA-1480 introduced table level locking that in conjunction with the
global catalog lock ensures serialized access to catalog table objects.
In some cases (e.g. multiple long running operations on same table), the
locking pattern used resulted in the catalog lock being held for
a long period of time, thus blocking other catalog operations from making
any progress. That resulted in high response times and the system
appearing to be hung.

Solution:
Change the locking pattern in the catalog for protecting table objects
so that no operation will hold the catalog lock for a long time if it
fails to acquire a table lock. The operation that attempts to acquire a
table lock and fails to do so must release the catalog lock and retry.
The use of fair locks prevent starvation from happening. The only
operation that doesn't follow this retry logic is the
getAllCatalogObjects() call that retrieves a snapshot of the catalog
metadata for transmitting to the statestore.

Testing:
I manually tested this change by running concurrency tests using JMeter
and verified that the throughput of catalog operations on a specific table
is not affected by other concurrent long running operations (e.g. refresh)
on a different table.

Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
3 files changed, 533 insertions(+), 437 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/5710/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has uploaded a new patch set (#2).

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................

IMPALA-4449: Revisit table locking pattern in the catalog

This commit fixes an issue where multiple long-running operations on the
same catalog object (e.g. table) can block other catalog operations from
making progress.

Problem:
IMPALA-1480 introduced table level locking that in conjunction with the
global catalog lock ensures serialized access to catalog table objects.
In some cases (e.g. multiple long running operations on same table), the
locking pattern used resulted in the catalog lock being held for
a long period of time, thus blocking other catalog operations from making
any progress. That resulted in high response times and the system
appearing to be hung.

Solution:
Change the locking pattern in the catalog for protecting table objects
so that no operation will hold the catalog lock for a long time if it
fails to acquire a table lock. The operation that attempts to acquire a
table lock and fails to do so must release the catalog lock and retry.
The use of fair locks prevent starvation from happening. The only
operation that doesn't follow this retry logic is the
getCatalogObjects() call that retrieves a snapshot of the catalog
metadata for transmitting to the statestore.

Testing:
I manually tested this change by running concurrency tests using JMeter
and verified that the throughput of catalog operations on a specific table
is not affected by other concurrent long running operations (e.g. refresh)
on a different table.

Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
3 files changed, 169 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/5710/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Hello Marcel Kornacker,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5710

to look at the new patch set (#5).

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................

IMPALA-4449: Revisit table locking pattern in the catalog

This commit fixes an issue where multiple long-running operations on the
same catalog object (e.g. table) can block other catalog operations from
making progress.

Problem:
IMPALA-1480 introduced table level locking that in conjunction with the
global catalog lock ensures serialized access to catalog table objects.
In some cases (e.g. multiple long running operations on same table), the
locking pattern used resulted in the catalog lock being held for
a long period of time, thus blocking other catalog operations from making
any progress. That resulted in high response times and the system
appearing to be hung.

Solution:
Change the locking pattern in the catalog for protecting table objects
so that no operation will hold the catalog lock for a long time if it
fails to acquire a table lock. The operation that attempts to acquire a
table lock and fails to do so must release the catalog lock and retry.
The use of fair locks prevent starvation from happening. The only
operation that doesn't follow this retry logic is the
getCatalogObjects() call that retrieves a snapshot of the catalog
metadata for transmitting to the statestore.

Testing:
I manually tested this change by running concurrency tests using JMeter
and verified that the throughput of catalog operations on a specific table
is not affected by other concurrent long running operations (e.g. refresh)
on a different table.

Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
3 files changed, 183 insertions(+), 73 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/5710/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................


Patch Set 5: Code-Review+2

Rebase and carry Marcel's +2

-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/5710/1//COMMIT_MSG
Commit Message:

PS1, Line 29: getAllCatalogObjects
getCatalogObjects()?


http://gerrit.cloudera.org:8080/#/c/5710/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 926:     while (true) {
Regarding supportability, should we log a debug message/expose a average wait time metric via JMX if its stuck in these loops for longer than a configured value? IMO that makes it easier to debug queries hung at Catalog operations. (Same comment for other methods that follow this pattern, just wanted to hear you thoughts)


Line 928:       if (tbl.getLock().writeLock().tryLock()) {
To avoid extra branch, may be we could refactor to something like

catalogLock_.writeLock().lock();
if (!tbl.getLock().writeLock().tryLock()) {
  catalogLock_.writeLock().unlock();
  continue;
}
.
.
.
(Same comment for other places that uses this pattern)


PS1, Line 950: continue;
redundant?


Line 1305:           hdfsTable.setCatalogVersion(newCatalogVersion);
- Just curious, is this an existing bug that we didn't set it if the partition exists?

- May be we can move this line after L1285 before the try block since we are setting for both cases anyway and remove it at L1296?


Line 1312:         continue;
redundant?


http://gerrit.cloudera.org:8080/#/c/5710/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 190:  * }
I think we should document the exception of getCatalogObjects() here.


-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5710/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 975:       tbl.getLock().unlock();
where do we unlock the catalog write-lock?


-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................


Patch Set 2:

Marcel, can you plz look at patch set 2 instead. Several things changed and your comments may not be relevant any more.

-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5710/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 185:   private static final long TBL_LOCK_TIMEOUT_MS = 3600000;
> Given that certain table operations (e.g. refresh) can take a lot of time d
let's leave as-is for now, since this patch is a clear improvement over current behavior.

but in general blocking for an hour isn't a great idea. so i was wrong earlier, and we should add a query option so that the application can decide on how long to wait.

would be good to do that as a follow-on. or do it now if you want.


-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................


Patch Set 5:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/200/

-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/5710/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 184:   private static final long TIMEOUT_FOR_TBL_LOCK_IN_MSEC = 3600000;
an hour?

also, this doesn't need to be a sentence. something like tbl_lock_timeout_msec is equally informative.


Line 189:    * false otherwise.
point out which locks are held on return in both scenarios.


Line 191:   public boolean tryToAcquireCatalogLocks(Table tbl) {
how about something briefer like tryLockTable?


Line 205:       Thread.yield();
comment on significance of this. i'm assuming you need it to prevent spinning here.


Line 321:           tbl.getLock().lock();
does it make sense to have rw locks for tables as well? sounds like that might speed up 'show tables'.


http://gerrit.cloudera.org:8080/#/c/5710/2/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 71:   // Lock protecting this table
explain why this is needed (why do we reacquire the lock when we already have it?). fine to do in a different class that already explains the overall locking pattern.


http://gerrit.cloudera.org:8080/#/c/5710/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 173:  * ensures that the catalog lock is never held for a long period of time, preventing
this pattern only applies to single-table update operations, worth pointing out (which then also explains why getCatalogObjects() is different)


Line 3127:                 for (org.apache.hadoop.hive.metastore.api.Partition part:
single line


-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5710/2/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 72:   private final ReentrantLock tableLock_ = new ReentrantLock(true);
> We need a fair lock because you may have relatively short operations on a t
As discussed, we use tryLock() to not block on this lock, but the fairness aspect pertains to the lock-acquisition order of threads that are blocked, waiting for this lock. So a fair lock does not make sense here, and we cannot prevent starvation.


-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................


IMPALA-4449: Revisit table locking pattern in the catalog

This commit fixes an issue where multiple long-running operations on the
same catalog object (e.g. table) can block other catalog operations from
making progress.

Problem:
IMPALA-1480 introduced table level locking that in conjunction with the
global catalog lock ensures serialized access to catalog table objects.
In some cases (e.g. multiple long running operations on same table), the
locking pattern used resulted in the catalog lock being held for
a long period of time, thus blocking other catalog operations from making
any progress. That resulted in high response times and the system
appearing to be hung.

Solution:
Change the locking pattern in the catalog for protecting table objects
so that no operation will hold the catalog lock for a long time if it
fails to acquire a table lock. The operation that attempts to acquire a
table lock and fails to do so must release the catalog lock and retry.
The use of fair locks prevent starvation from happening. The only
operation that doesn't follow this retry logic is the
getCatalogObjects() call that retrieves a snapshot of the catalog
metadata for transmitting to the statestore.

Testing:
I manually tested this change by running concurrency tests using JMeter
and verified that the throughput of catalog operations on a specific table
is not affected by other concurrent long running operations (e.g. refresh)
on a different table.

Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Reviewed-on: http://gerrit.cloudera.org:8080/5710
Reviewed-by: Dimitris Tsirogiannis <dt...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
3 files changed, 183 insertions(+), 73 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Dimitris Tsirogiannis: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5710/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 321:           tbl.getLock().lock();
> does it make sense to have rw locks for tables as well? sounds like that mi
never mind, this is only catalogd logic.


-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/5710/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 184:   private static final long TIMEOUT_FOR_TBL_LOCK_IN_MSEC = 3600000;
We usually use the MS suffix elsewhere, i.e.,

TBL_LOCK_TIMEOUT_MS

I think we should consider making this higher and/or even configurable (in a separate patch).


Line 187:    * Tries to acquire a table lock while holding the catalogLock_. Returns true if it
Tries to acquire the catalogLock_ and the lock of 'tbl' in that order.


Line 205:       Thread.yield();
I believe this will call sched_yield() on Linux, so may still end up with busy-spinning-like behavior or at least unpredictable "waiting" times. What's wrong with Thread.sleep()?


http://gerrit.cloudera.org:8080/#/c/5710/2/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 72:   private final ReentrantLock tableLock_ = new ReentrantLock(true);
Thinking about this a little more: Why does this lock need to be fair? We don't really wait for it and instead we have a try+retry loop.

I'm mostly asking because forcing fairness could mislead readers if that feature is not really needed/used.

I think preventing starvation with a try+retry approach will be hard.


-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has uploaded a new patch set (#3).

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................

IMPALA-4449: Revisit table locking pattern in the catalog

This commit fixes an issue where multiple long-running operations on the
same catalog object (e.g. table) can block other catalog operations from
making progress.

Problem:
IMPALA-1480 introduced table level locking that in conjunction with the
global catalog lock ensures serialized access to catalog table objects.
In some cases (e.g. multiple long running operations on same table), the
locking pattern used resulted in the catalog lock being held for
a long period of time, thus blocking other catalog operations from making
any progress. That resulted in high response times and the system
appearing to be hung.

Solution:
Change the locking pattern in the catalog for protecting table objects
so that no operation will hold the catalog lock for a long time if it
fails to acquire a table lock. The operation that attempts to acquire a
table lock and fails to do so must release the catalog lock and retry.
The use of fair locks prevent starvation from happening. The only
operation that doesn't follow this retry logic is the
getCatalogObjects() call that retrieves a snapshot of the catalog
metadata for transmitting to the statestore.

Testing:
I manually tested this change by running concurrency tests using JMeter
and verified that the throughput of catalog operations on a specific table
is not affected by other concurrent long running operations (e.g. refresh)
on a different table.

Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
3 files changed, 170 insertions(+), 73 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/5710/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/5710/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 184:   // TODO: Make this parameter configurable
why? we want to restructure the entire object and locking hierarchy anyway.


Line 185:   private static final long TBL_LOCK_TIMEOUT_MS = 3600000;
this is still an hour. i don't think that's a reasonable timeout value.


Line 977:       tbl.getLock().unlock();
could you add a precondition into all of the finally branches that make sure that the catalog lock isn't being held?


http://gerrit.cloudera.org:8080/#/c/5710/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 194:  * Note: The getCatalogObjects() function is the only case where this locking pattern is
it's not the only case *in principle* (maybe it is right now), because every operation that needs to touch multiple tables would also need a different locking pattern.


-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5710/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 185:   private static final long TBL_LOCK_TIMEOUT_MS = 3600000;
> let's leave as-is for now, since this patch is a clear improvement over cur
perfect, I agree with approach


-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................


Patch Set 1:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/5710/1/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 72:   // Lock protecting this table
// Fair lock protecting this table.

Why not a ReentrantLock?


http://gerrit.cloudera.org:8080/#/c/5710/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 172:  * consistently in the presence of concurrent DDL operations. This pattern ensures that
This -> The following?


Line 175:  * prevent starvation from happening.
remove "from happening"


Line 182:  *     CONTINUE
Looks like we might busy spin for a long time. Consider adding a wait.


Line 183:  *   } ELSE {
remove ELSE block to simplify?


Line 357:     while (true) {
Comment why we are looping forever.

Do you think it makes sense to consider a very conservative timeout for the overall operation? It seems a little scary to have a thread busy spinning forever.

If we get into a pathological/buggy scenario, restarting the catalogd may be the only remedy.


Line 359:       if (tbl.getLock().writeLock().tryLock()) {
Reverse the logic like in your snippet in the class comment.


Line 361:           if (params.getAlter_type() == TAlterTableType.RENAME_VIEW
Can we move this inner logic into a separate function? It would be nice to see the locking and retry logic in a small snippet of code.


Line 652:       if (tbl.getLock().writeLock().tryLock()) {
same here with reversing the logic


Line 1456:     while (true) {
Instead of replicating the lock crabbing+retry logic in several places, can you think of a reasonable way to abstract it out? Just asking for a discussion, maybe it doesn't make sense.


-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................


Patch Set 1:

(3 comments)

some preliminary comments

http://gerrit.cloudera.org:8080/#/c/5710/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 294:           tbl.getLock().writeLock().lock();
explain why write lock (for what appears to be a read operation). do this in (some) class comment.


Line 928:       if (tbl.getLock().writeLock().tryLock()) {
is this busy-spinning? i don't see any timeouts here


http://gerrit.cloudera.org:8080/#/c/5710/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 181:  *     Release the catalog lock
doing it this way around might actually make the code more readable.


-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................


Patch Set 2:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/5710/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 184:   private static final long TIMEOUT_FOR_TBL_LOCK_IN_MSEC = 3600000;
> an hour?
Done


Line 184:   private static final long TIMEOUT_FOR_TBL_LOCK_IN_MSEC = 3600000;
> We usually use the MS suffix elsewhere, i.e.,
Renamed it. How much higher? :)


Line 187:    * Tries to acquire a table lock while holding the catalogLock_. Returns true if it
> Tries to acquire the catalogLock_ and the lock of 'tbl' in that order.
Done


Line 189:    * false otherwise.
> point out which locks are held on return in both scenarios.
Done


Line 191:   public boolean tryToAcquireCatalogLocks(Table tbl) {
> how about something briefer like tryLockTable?
Hm, that's what I had in the beginning. Wanted to indicate that we also acquire the catalogLock in this function. Anyways, I renamed it. LMK if that works better.


Line 205:       Thread.yield();
> comment on significance of this. i'm assuming you need it to prevent spinni
Done


Line 205:       Thread.yield();
> I believe this will call sched_yield() on Linux, so may still end up with b
Sleep for how long?


Line 321:           tbl.getLock().lock();
> does it make sense to have rw locks for tables as well? sounds like that mi
Done


Line 321:           tbl.getLock().lock();
> never mind, this is only catalogd logic.
Done


Line 975:       tbl.getLock().unlock();
> where do we unlock the catalog write-lock?
In L959.


http://gerrit.cloudera.org:8080/#/c/5710/2/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 71:   // Lock protecting this table
> explain why this is needed (why do we reacquire the lock when we already ha
Hm, what do you mean by "reacquire the lock when we already have it"? Which case do your refer to that this occurs?


Line 72:   private final ReentrantLock tableLock_ = new ReentrantLock(true);
> Thinking about this a little more: Why does this lock need to be fair? We d
We need a fair lock because you may have relatively short operations on a table blocked forever by long running operations such as refresh on the same table. The retry logic doesn't guarantee that each operation will make progress eventually.


http://gerrit.cloudera.org:8080/#/c/5710/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 173:  * ensures that the catalog lock is never held for a long period of time, preventing
> this pattern only applies to single-table update operations, worth pointing
Good point. Updated the comment.


Line 3127:                 for (org.apache.hadoop.hive.metastore.api.Partition part:
> single line
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................


Patch Set 1:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/5710/1//COMMIT_MSG
Commit Message:

PS1, Line 29: getAllCatalogObjects
> getCatalogObjects()?
Done


http://gerrit.cloudera.org:8080/#/c/5710/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 926:     while (true) {
> Regarding supportability, should we log a debug message/expose a average wa
Added a trace message for now.


Line 928:       if (tbl.getLock().writeLock().tryLock()) {
> To avoid extra branch, may be we could refactor to something like
Done


PS1, Line 950: continue;
> redundant?
Yes, removed.


Line 1305:           hdfsTable.setCatalogVersion(newCatalogVersion);
> - Just curious, is this an existing bug that we didn't set it if the partit
Hm, why do we need to update the table version if the partition already exists?


Line 1312:         continue;
> redundant?
Yes, removed.


http://gerrit.cloudera.org:8080/#/c/5710/1/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 72:   // Lock protecting this table
> // Fair lock protecting this table.
Done


http://gerrit.cloudera.org:8080/#/c/5710/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 172:  * consistently in the presence of concurrent DDL operations. This pattern ensures that
> This -> The following?
Done


Line 175:  * prevent starvation from happening.
> remove "from happening"
Done


Line 182:  *     CONTINUE
> Looks like we might busy spin for a long time. Consider adding a wait.
Done


Line 183:  *   } ELSE {
> remove ELSE block to simplify?
Done


Line 190:  * }
> I think we should document the exception of getCatalogObjects() here.
Done


Line 357:     while (true) {
> Comment why we are looping forever.
Added an 1hr timeout. Let me know if you believe a larger one is required.


Line 359:       if (tbl.getLock().writeLock().tryLock()) {
> Reverse the logic like in your snippet in the class comment.
Done


Line 361:           if (params.getAlter_type() == TAlterTableType.RENAME_VIEW
> Can we move this inner logic into a separate function? It would be nice to 
Done


Line 652:       if (tbl.getLock().writeLock().tryLock()) {
> same here with reversing the logic
Done


Line 1456:     while (true) {
> Instead of replicating the lock crabbing+retry logic in several places, can
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5710/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 184:   // TODO: Make this parameter configurable
> why? we want to restructure the entire object and locking hierarchy anyway.
Removed.


Line 185:   private static final long TBL_LOCK_TIMEOUT_MS = 3600000;
> this is still an hour. i don't think that's a reasonable timeout value.
Given that certain table operations (e.g. refresh) can take a lot of time don't you think it's reasonable to have a timeout like this? I am open to suggestions...


Line 196:       catalogLock_.writeLock().lock();
> - The only pattern I see for taking this write lock seems to be to be to ca
Good suggestion, but there is a place where we can't do that because the catalog lock is not only held for incrementing the version. See table and view rename.


Line 977:       tbl.getLock().unlock();
> could you add a precondition into all of the finally branches that make sur
Done


http://gerrit.cloudera.org:8080/#/c/5710/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 194:  * Note: The getCatalogObjects() function is the only case where this locking pattern is
> it's not the only case *in principle* (maybe it is right now), because ever
Yes, what I mean is that this is the only case where both catalogLock_ and table locks are acquired and this pattern is not applied. Let me know if you prefer this comment rewritten.


Line 384:       catalog_.getLock().writeLock().unlock();
> We don't seem to be write-locking this after L378. Is this really required?
We are if this is not a rename operation.


-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5710/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 196:       catalogLock_.writeLock().lock();
- The only pattern I see for taking this write lock seems to be to be to call incrementAndGetCatalogVersion() and then we immediately unlock() it.(Please correct me if I'm wrong).

- If it is the only case, can we call incrementAndGetCatalogVersion() below and make this method return the new version? Something like tryLockTableAndGetNewVersion()? That prevents the callers from accidentally missing a catalogLock_.writeLock().unlock()?

- That also addresses Marcel's comment about adding a Precondition check for each block.

- Not totally sure if this is a good idea but wanted to put it up for discussion.


http://gerrit.cloudera.org:8080/#/c/5710/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 384:       catalog_.getLock().writeLock().unlock();
We don't seem to be write-locking this after L378. Is this really required?


-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................


Patch Set 5: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4449: Revisit table locking pattern in the catalog

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has uploaded a new patch set (#4).

Change subject: IMPALA-4449: Revisit table locking pattern in the catalog
......................................................................

IMPALA-4449: Revisit table locking pattern in the catalog

This commit fixes an issue where multiple long-running operations on the
same catalog object (e.g. table) can block other catalog operations from
making progress.

Problem:
IMPALA-1480 introduced table level locking that in conjunction with the
global catalog lock ensures serialized access to catalog table objects.
In some cases (e.g. multiple long running operations on same table), the
locking pattern used resulted in the catalog lock being held for
a long period of time, thus blocking other catalog operations from making
any progress. That resulted in high response times and the system
appearing to be hung.

Solution:
Change the locking pattern in the catalog for protecting table objects
so that no operation will hold the catalog lock for a long time if it
fails to acquire a table lock. The operation that attempts to acquire a
table lock and fails to do so must release the catalog lock and retry.
The use of fair locks prevent starvation from happening. The only
operation that doesn't follow this retry logic is the
getCatalogObjects() call that retrieves a snapshot of the catalog
metadata for transmitting to the statestore.

Testing:
I manually tested this change by running concurrency tests using JMeter
and verified that the throughput of catalog operations on a specific table
is not affected by other concurrent long running operations (e.g. refresh)
on a different table.

Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
3 files changed, 182 insertions(+), 73 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/5710/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5710
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id08e21da31deb1f003b3cada4517651f3b3b2bb2
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>