You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Will Berkeley (Code Review)" <ge...@cloudera.org> on 2019/01/22 23:24:15 UTC

[kudu-CR] [spark] Add metrics to kudu-spark

Will Berkeley has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12251


Change subject: [spark] Add metrics to kudu-spark
......................................................................

[spark] Add metrics to kudu-spark

This adds some basic accumulator metrics to kudu-spark. These
accumulators appear for each stage involving a KuduContext, with values
broken down by executor.

Follow-ups will add some more sophisticated metrics and use metrics to
add additional logging.

In addition to the unit tests, I tested this on a 3-node cluster and
confirmed I was able to see the accumulator values for stages and
individual tasks on the Spark application's web UI, and that were
correct.

Change-Id: Ied81c890b3d4510f767d8f023963ff878f398140
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala
6 files changed, 73 insertions(+), 23 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/12251/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ied81c890b3d4510f767d8f023963ff878f398140
Gerrit-Change-Number: 12251
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>

[kudu-CR] [spark] Add metrics to kudu-spark

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12251 )

Change subject: [spark] Add metrics to kudu-spark
......................................................................


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12251/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala:

http://gerrit.cloudera.org:8080/#/c/12251/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@156
PS1, Line 156:       rowsRead.add(currentIterator.getNumRows)
This is a bit tricky. The rows have been read from Kudu but not necessarily read/returned to the spark application via the get call below. 

The best/right choice of incrementing rowsRead here vs the get call isn't clear to me. Just out of interest, any reason you picked doing it here?



-- 
To view, visit http://gerrit.cloudera.org:8080/12251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied81c890b3d4510f767d8f023963ff878f398140
Gerrit-Change-Number: 12251
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 23 Jan 2019 02:35:03 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark] Add metrics to kudu-spark

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12251 )

Change subject: [spark] Add metrics to kudu-spark
......................................................................

[spark] Add metrics to kudu-spark

This adds some basic accumulator metrics to kudu-spark. These
accumulators appear for each stage involving a KuduContext, with values
broken down by executor.

Follow-ups will add some more sophisticated metrics and use metrics to
add additional logging.

In addition to the unit tests, I tested this on a 3-node cluster and
confirmed I was able to see the accumulator values for stages and
individual tasks on the Spark application's web UI, and that were
correct.

Change-Id: Ied81c890b3d4510f767d8f023963ff878f398140
Reviewed-on: http://gerrit.cloudera.org:8080/12251
Reviewed-by: Grant Henke <gr...@apache.org>
Tested-by: Will Berkeley <wd...@gmail.com>
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala
6 files changed, 74 insertions(+), 23 deletions(-)

Approvals:
  Grant Henke: Looks good to me, approved
  Will Berkeley: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/12251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ied81c890b3d4510f767d8f023963ff878f398140
Gerrit-Change-Number: 12251
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [spark] Add metrics to kudu-spark

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/12251 )

Change subject: [spark] Add metrics to kudu-spark
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/12251/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/12251/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@69
PS1, Line 69: KA
> A
Done


http://gerrit.cloudera.org:8080/#/c/12251/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala:

http://gerrit.cloudera.org:8080/#/c/12251/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@130
PS1, Line 130: val rowsRead: LongAccumulator
> doc?
Done


http://gerrit.cloudera.org:8080/#/c/12251/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@156
PS1, Line 156:       rowsRead.add(currentIterator.getNumRows)
> This is a bit tricky. The rows have been read from Kudu but not necessarily
The reason it is done here is because this is about performance measurement (how many rows did executor 1 read v. executor 2, and in how long?), as opposed to checking correctness (I don't see all 1000 rows that I expect...how many did each executor read?). I think for the former it is better to measure what is read from Kudu regardless of whether the rows are consumed by the Spark application.



-- 
To view, visit http://gerrit.cloudera.org:8080/12251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied81c890b3d4510f767d8f023963ff878f398140
Gerrit-Change-Number: 12251
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 23 Jan 2019 19:57:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark] Add metrics to kudu-spark

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has removed a vote on this change.

Change subject: [spark] Add metrics to kudu-spark
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/12251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ied81c890b3d4510f767d8f023963ff878f398140
Gerrit-Change-Number: 12251
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [spark] Add metrics to kudu-spark

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/12251 )

Change subject: [spark] Add metrics to kudu-spark
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12251/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/12251/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@69
PS1, Line 69: KA
A



-- 
To view, visit http://gerrit.cloudera.org:8080/12251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied81c890b3d4510f767d8f023963ff878f398140
Gerrit-Change-Number: 12251
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 23 Jan 2019 00:13:20 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark] Add metrics to kudu-spark

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12251 )

Change subject: [spark] Add metrics to kudu-spark
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied81c890b3d4510f767d8f023963ff878f398140
Gerrit-Change-Number: 12251
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 23 Jan 2019 20:18:36 +0000
Gerrit-HasComments: No

[kudu-CR] [spark] Add metrics to kudu-spark

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/12251 )

Change subject: [spark] Add metrics to kudu-spark
......................................................................


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12251/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala:

http://gerrit.cloudera.org:8080/#/c/12251/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@130
PS1, Line 130: val rowsRead: LongAccumulator
doc?



-- 
To view, visit http://gerrit.cloudera.org:8080/12251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied81c890b3d4510f767d8f023963ff878f398140
Gerrit-Change-Number: 12251
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 23 Jan 2019 00:51:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] [spark] Add metrics to kudu-spark

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12251

to look at the new patch set (#2).

Change subject: [spark] Add metrics to kudu-spark
......................................................................

[spark] Add metrics to kudu-spark

This adds some basic accumulator metrics to kudu-spark. These
accumulators appear for each stage involving a KuduContext, with values
broken down by executor.

Follow-ups will add some more sophisticated metrics and use metrics to
add additional logging.

In addition to the unit tests, I tested this on a 3-node cluster and
confirmed I was able to see the accumulator values for stages and
individual tasks on the Spark application's web UI, and that were
correct.

Change-Id: Ied81c890b3d4510f767d8f023963ff878f398140
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala
6 files changed, 74 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/12251/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ied81c890b3d4510f767d8f023963ff878f398140
Gerrit-Change-Number: 12251
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [spark] Add metrics to kudu-spark

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/12251 )

Change subject: [spark] Add metrics to kudu-spark
......................................................................


Patch Set 2: Verified+1

I didn't rebase to pick up the Python pandas issue fix.


-- 
To view, visit http://gerrit.cloudera.org:8080/12251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied81c890b3d4510f767d8f023963ff878f398140
Gerrit-Change-Number: 12251
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 23 Jan 2019 21:39:59 +0000
Gerrit-HasComments: No