You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Will Berkeley (Code Review)" <ge...@cloudera.org> on 2019/01/22 23:24:19 UTC

[kudu-CR] [spark] Add some logging to trace KuduContext operations

Will Berkeley has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12252


Change subject: [spark] Add some logging to trace KuduContext operations
......................................................................

[spark] Add some logging to trace KuduContext operations

This patch adds some logging to help track how long an entire
KuduContext operation takes and also how long each part takes on each
executor. This information has been sorely lacking in some cases where
Spark's laziness makes attributing slowness to Kudu (vs other components
of the Spark job) very difficult.

Unfortunately, it's not as straightforward to add this sort of logging
to reading from Kudu (KuduRDD) because Spark may lazily read batches
from Kudu, and batches may be small enough that logging for each batch
is so verbose that it is not useful.

I tested this patch manually on a 3-node cluster and confirmed I saw the
expected log messages on the driver and on the executors, e.g.

19/01/22 15:18:13 INFO kudu.KuduContext: applying operations of type 'insert' to table 'impala::default.aaa'
19/01/22 15:18:13 INFO kudu.KuduContext: applied 1 operations of type 'insert' to table 'impala::default.aaa'

Change-Id: I6741f2584c1bc6b229d10d37297515474318f94c
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
2 files changed, 26 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/12252/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6741f2584c1bc6b229d10d37297515474318f94c
Gerrit-Change-Number: 12252
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>

[kudu-CR] [spark] Add some logging to trace KuduContext operations

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has removed a vote on this change.

Change subject: [spark] Add some logging to trace KuduContext operations
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/12252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I6741f2584c1bc6b229d10d37297515474318f94c
Gerrit-Change-Number: 12252
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [spark] Add some logging to trace KuduContext operations

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/12252 )

Change subject: [spark] Add some logging to trace KuduContext operations
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12252/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/12252/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@246
PS1, Line 246: log.info
> Maybe, move this into writeRows() itself?  It seems there are enough parame
There's two reasons I made it this way:

1. Nicer logs. Insert/upsert/update/delete conjugate differently and use different prepositions, e.g. "inserted" (add -ed) vs. "deleted" (add -d), and "insert into" vs. "delete from".
2. This doesn't feel worse to me than two big `match` statements in `writeRows` to do the logging cleanly.

If it bugs you that much, feel free to make a follow-up that changes it :)



-- 
To view, visit http://gerrit.cloudera.org:8080/12252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6741f2584c1bc6b229d10d37297515474318f94c
Gerrit-Change-Number: 12252
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 23 Jan 2019 19:57:52 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark] Add some logging to trace KuduContext operations

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12252 )

Change subject: [spark] Add some logging to trace KuduContext operations
......................................................................

[spark] Add some logging to trace KuduContext operations

This patch adds some logging to help track how long an entire
KuduContext operation takes and also how long each part takes on each
executor. This information has been sorely lacking in some cases where
Spark's laziness makes attributing slowness to Kudu (vs other components
of the Spark job) very difficult.

Unfortunately, it's not as straightforward to add this sort of logging
to reading from Kudu (KuduRDD) because Spark may lazily read batches
from Kudu, and batches may be small enough that logging for each batch
is so verbose that it is not useful.

I tested this patch manually on a 3-node cluster and confirmed I saw the
expected log messages on the driver and on the executors, e.g.

19/01/22 15:18:13 INFO kudu.KuduContext: applying operations of type 'insert' to table 'impala::default.aaa'
19/01/22 15:18:13 INFO kudu.KuduContext: applied 1 operations of type 'insert' to table 'impala::default.aaa'

Change-Id: I6741f2584c1bc6b229d10d37297515474318f94c
Reviewed-on: http://gerrit.cloudera.org:8080/12252
Reviewed-by: Grant Henke <gr...@apache.org>
Tested-by: Will Berkeley <wd...@gmail.com>
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
2 files changed, 26 insertions(+), 3 deletions(-)

Approvals:
  Grant Henke: Looks good to me, approved
  Will Berkeley: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/12252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I6741f2584c1bc6b229d10d37297515474318f94c
Gerrit-Change-Number: 12252
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [spark] Add some logging to trace KuduContext operations

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/12252 )

Change subject: [spark] Add some logging to trace KuduContext operations
......................................................................


Patch Set 2: Verified+1

I didn't rebase to pick up the fix for the Python pandas issue.


-- 
To view, visit http://gerrit.cloudera.org:8080/12252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6741f2584c1bc6b229d10d37297515474318f94c
Gerrit-Change-Number: 12252
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 23 Jan 2019 21:39:31 +0000
Gerrit-HasComments: No

[kudu-CR] [spark] Add some logging to trace KuduContext operations

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12252 )

Change subject: [spark] Add some logging to trace KuduContext operations
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6741f2584c1bc6b229d10d37297515474318f94c
Gerrit-Change-Number: 12252
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 23 Jan 2019 20:18:49 +0000
Gerrit-HasComments: No

[kudu-CR] [spark] Add some logging to trace KuduContext operations

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/12252 )

Change subject: [spark] Add some logging to trace KuduContext operations
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12252/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/12252/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@246
PS1, Line 246: log.info
Maybe, move this into writeRows() itself?  It seems there are enough parameters for writeRows() to differentiate among various ops and tables.



-- 
To view, visit http://gerrit.cloudera.org:8080/12252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6741f2584c1bc6b229d10d37297515474318f94c
Gerrit-Change-Number: 12252
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 23 Jan 2019 05:12:17 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark] Add some logging to trace KuduContext operations

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12252 )

Change subject: [spark] Add some logging to trace KuduContext operations
......................................................................


Patch Set 1: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6741f2584c1bc6b229d10d37297515474318f94c
Gerrit-Change-Number: 12252
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 23 Jan 2019 02:37:31 +0000
Gerrit-HasComments: No