You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org> on 2017/04/15 01:04:02 UTC

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Bharath Vissapragada has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6651

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................

IMPALA-4943: Speed up block md loading for add/recover partition calls.

This change makes alter table add/recover partitions calls use the
per directory block metadata loading routines instead of doing it
per file. This is done since these calls always load the entire
partition directory from scratch and there is no advantage in
loading them incrementally on a per-file basis.

Tests: Ran core tests and the metadata benchmark tests.

(I) Improvement: METADATA-BENCHMARKS()
100K-PARTITIONS-1M-FILES-03-RECOVER [text / none / none] (718.62s ->
549.91s [-23.48%])

(I) Improvement: METADATA-BENCHMARKS()
100K-PARTITIONS-1M-FILES-08-ADD-PARTITION [text / none / none] (46.92s
-> 26.20s [-44.15%])

Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
1 file changed, 19 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/6651/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6651

to look at the new patch set (#4).

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................

IMPALA-4943: Speed up block md loading for add/recover partition calls.

This change makes alter table add/recover partitions calls use the
per directory block metadata loading routines instead of doing it
per file. This is done since these calls always load the entire
partition directory from scratch and there is no advantage in
loading them incrementally on a per-file basis.

Tests: Ran core tests and the metadata benchmark tests.

(I) Improvement: METADATA-BENCHMARKS()
100K-PARTITIONS-1M-FILES-03-RECOVER [text / none / none] (718.62s ->
549.91s [-23.48%])

(I) Improvement: METADATA-BENCHMARKS()
100K-PARTITIONS-1M-FILES-08-ADD-PARTITION [text / none / none] (46.92s
-> 26.20s [-44.15%])

Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
1 file changed, 20 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/6651/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................


Patch Set 5:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/489/

-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................


Patch Set 3: Code-Review+2

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6651/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS3, Line 838: block
file


PS3, Line 1530: block
file


PS3, Line 1529: code
              :    * path
function


PS3, Line 1530: , like
              :    * adding a new file to an existing partition
remove


-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................


Patch Set 4: Code-Review+2

Carrying +2. Holding off GVO till other priority fixes are in.

-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6651/2//COMMIT_MSG
Commit Message:

Line 17: (I) Improvement: METADATA-BENCHMARKS()
How do the numbers compare to pre-IMPALA-4172/IMPALA-3653? Is the regression completely addressed?


-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6651/2//COMMIT_MSG
Commit Message:

Line 17: (I) Improvement: METADATA-BENCHMARKS()
> Tried triggering a perf build on pre-IMPALA-4172 branch. It doesn't work du
Thank you! Looks like we can call the regression fixed.


-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6651/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS1, Line 919: loadPartitionBlockMdFromScratch
I am not sure this is correct. createAndLoadPartition() is called during partition refresh. Why do we want to load metadata from scratch in this case? Are you sure we don't regress this case?


-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................


Patch Set 5: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has uploaded a new patch set (#2).

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................

IMPALA-4943: Speed up block md loading for add/recover partition calls.

This change makes alter table add/recover partitions calls use the
per directory block metadata loading routines instead of doing it
per file. This is done since these calls always load the entire
partition directory from scratch and there is no advantage in
loading them incrementally on a per-file basis.

Tests: Ran core tests and the metadata benchmark tests.

(I) Improvement: METADATA-BENCHMARKS()
100K-PARTITIONS-1M-FILES-03-RECOVER [text / none / none] (718.62s ->
549.91s [-23.48%])

(I) Improvement: METADATA-BENCHMARKS()
100K-PARTITIONS-1M-FILES-08-ADD-PARTITION [text / none / none] (46.92s
-> 26.20s [-44.15%])

Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
1 file changed, 21 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/6651/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6651/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS2, Line 839: already
> remove
Done


PS2, Line 842: loadPartitionBlockMdFromScratc
> Refreshing an existing partition calls the refreshFileMetadata() while load
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................


Patch Set 5:

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/489/

-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................


Patch Set 5:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/483/

-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................


IMPALA-4943: Speed up block md loading for add/recover partition calls.

This change makes alter table add/recover partitions calls use the
per directory block metadata loading routines instead of doing it
per file. This is done since these calls always load the entire
partition directory from scratch and there is no advantage in
loading them incrementally on a per-file basis.

Tests: Ran core tests and the metadata benchmark tests.

(I) Improvement: METADATA-BENCHMARKS()
100K-PARTITIONS-1M-FILES-03-RECOVER [text / none / none] (718.62s ->
549.91s [-23.48%])

(I) Improvement: METADATA-BENCHMARKS()
100K-PARTITIONS-1M-FILES-08-ADD-PARTITION [text / none / none] (46.92s
-> 26.20s [-44.15%])

Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Reviewed-on: http://gerrit.cloudera.org:8080/6651
Reviewed-by: Bharath Vissapragada <bh...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
1 file changed, 20 insertions(+), 5 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Bharath Vissapragada: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6651/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS3, Line 838: block
> file
Done


PS3, Line 1529: code
              :    * path
> function
Done


PS3, Line 1530: block
> file
Done


PS3, Line 1530: , like
              :    * adding a new file to an existing partition
> remove
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................


Patch Set 2:

Our metadata benchmark doesn't have "refresh partition" calls yet and hence the regression wasn't caught. Discussed with Mostafa to add it to the list of queries.

-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................


Patch Set 5:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/491/

-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has uploaded a new patch set (#3).

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................

IMPALA-4943: Speed up block md loading for add/recover partition calls.

This change makes alter table add/recover partitions calls use the
per directory block metadata loading routines instead of doing it
per file. This is done since these calls always load the entire
partition directory from scratch and there is no advantage in
loading them incrementally on a per-file basis.

Tests: Ran core tests and the metadata benchmark tests.

(I) Improvement: METADATA-BENCHMARKS()
100K-PARTITIONS-1M-FILES-03-RECOVER [text / none / none] (718.62s ->
549.91s [-23.48%])

(I) Improvement: METADATA-BENCHMARKS()
100K-PARTITIONS-1M-FILES-08-ADD-PARTITION [text / none / none] (46.92s
-> 26.20s [-44.15%])

Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
1 file changed, 21 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/6651/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6651/2//COMMIT_MSG
Commit Message:

Line 17: (I) Improvement: METADATA-BENCHMARKS()
> How do the numbers compare to pre-IMPALA-4172/IMPALA-3653? Is the regressio
Tried triggering a perf build on pre-IMPALA-4172 branch. It doesn't work due to toolchain changes. With Mostafa's help I manually collected data from 2.9.0 release builds from Tableau, looks like there are no regressions. Last column corresponds to 2.9.0 release and the one before that corresponds to the master. (The tests probably changed a little, so might not be apples/apples comparison, but this is the best I could gather).

00:34:29.551 | METADATA-BENCHMARKS() | 100K-PARTITIONS-1M-FILES-10-REFRESH-AFTER-ADD-PARTITION | text / none / none | 220.59 | 232.525651932
00:34:29.551 | METADATA-BENCHMARKS() | 100K-PARTITIONS-1M-FILES-05-QUERY-AFTER-INV             | text / none / none | 157.06 | 167.791799068
00:34:29.551 | METADATA-BENCHMARKS() | 100K-PARTITIONS-1M-FILES-11-DROP-PARTITION              | text / none / none | 21.35  | 29.54668593
00:34:29.551 | METADATA-BENCHMARKS() | 100K-PARTITIONS-1M-FILES-03-RECOVER                     | text / none / none | 549.91 | 787.940558195
00:34:29.551 | METADATA-BENCHMARKS() | 100K-PARTITIONS-1M-FILES-09-INVALIDATE                  | text / none / none | 5.18   | 6.843062162
00:34:29.551 | METADATA-BENCHMARKS() | 100K-PARTITIONS-1M-FILES-08-ADD-PARTITION               | text / none / none | 26.20  | 48.448208809


-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................


Patch Set 5:

Discussed this with Alex, this looks like one of those flaky issues IMPALA-5177/IMPALA-4845. Ran a couple of private jobs overnight and both of them passed. It looks like this patch may have increased the likely hood of the error, especially in ubuntu-14.04-from-scratch, that the GVO uses. I'll try to trigger another job and see if that works. If not, I'll debug further into the HMS side of things.

-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................


Patch Set 5: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/483/

-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6651/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS1, Line 919: loadPartitionBlockMdFromScratch
> I am not sure this is correct. createAndLoadPartition() is called during pa
Oops. Missed that caller. reloadPartition() should call refreshFileMetadata() instead. Done.


-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls.
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6651/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS2, Line 839: already
remove


PS2, Line 842: loadPartitionBlockMdFromScratc
Refreshing an existing partition calls the refreshFileMetadata() while loading a partition from scratch calls loadPartitionBlockMdFromScratch(). Both functions load file/block metadata, so maybe we should name the latter loadFileMetadataFromScratch()?


-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-HasComments: Yes