You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Hao Hao (Code Review)" <ge...@cloudera.org> on 2018/01/11 07:23:06 UTC

[kudu-CR] KUDU-2254: Detect and warn about misusage of KuduContext

Hao Hao has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9004


Change subject: KUDU-2254: Detect and warn about misusage of KuduContext
......................................................................

KUDU-2254: Detect and warn about misusage of KuduContext

Since KuduContext carries a bunch of information for client connection,
such as authentication token, KuduContext should be created in the
driver and shared with executors. It would be great to provide a way
to warn users (e.g Logging) about such behavior since this kind of
issues are very subtle to detect.

This patch adds logic to detect and warn about this kind of misusage of
KuduContecx based on TaskContext information. A corresponding unit test
is also added.

Change-Id: I65a7cd11a14fa8a668079b0d1fcf6ed3a34fb652
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduContextTest.scala
2 files changed, 43 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/9004/1
-- 
To view, visit http://gerrit.cloudera.org:8080/9004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I65a7cd11a14fa8a668079b0d1fcf6ed3a34fb652
Gerrit-Change-Number: 9004
Gerrit-PatchSet: 1
Gerrit-Owner: Hao Hao <ha...@cloudera.com>

[kudu-CR] KUDU-2254: Detect and warn about misusage of KuduContext

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9004 )

Change subject: KUDU-2254: Detect and warn about misusage of KuduContext
......................................................................


Patch Set 2:

Hmm, I thought we'd seen a case where a user had created a KuduContext inside their executor somehow. Did it turn out that we misdiagnosed that issue?


-- 
To view, visit http://gerrit.cloudera.org:8080/9004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I65a7cd11a14fa8a668079b0d1fcf6ed3a34fb652
Gerrit-Change-Number: 9004
Gerrit-PatchSet: 2
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 12 Jan 2018 00:54:36 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2254: Detect and warn about misusage of KuduContext

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/9004 )

Change subject: KUDU-2254: Detect and warn about misusage of KuduContext
......................................................................


Patch Set 2:

I would like to abandon this code review if there is no objections, given the reason responded in previous comment, repasted it here.

"This patch may not be necessary. Since creation of KuduContext needs SparkContext, while SparkContext cannot be serializable to executors. That means the users need to create a new SparkContext in the executor to be able to create KuduContext, and SparkContext should be a driver specific entity."


-- 
To view, visit http://gerrit.cloudera.org:8080/9004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I65a7cd11a14fa8a668079b0d1fcf6ed3a34fb652
Gerrit-Change-Number: 9004
Gerrit-PatchSet: 2
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 12 Jan 2018 00:52:04 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2254: Detect and warn about misusage of KuduContext

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has abandoned this change. ( http://gerrit.cloudera.org:8080/9004 )

Change subject: KUDU-2254: Detect and warn about misusage of KuduContext
......................................................................


Abandoned

Per above discussion, this no longer seems relevant
-- 
To view, visit http://gerrit.cloudera.org:8080/9004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I65a7cd11a14fa8a668079b0d1fcf6ed3a34fb652
Gerrit-Change-Number: 9004
Gerrit-PatchSet: 2
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2254: Detect and warn about misusage of KuduContext

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/9004 )

Change subject: KUDU-2254: Detect and warn about misusage of KuduContext
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9004/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/9004/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@91
PS1, Line 91:   // Checking if KuduContext is created inside an active task,
> The logger inside the object on line 316 should be available here.  'val' f
Done


http://gerrit.cloudera.org:8080/#/c/9004/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@94
PS1, Line 94:   if (taskContext != null) {
> mark this private[kudu]
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I65a7cd11a14fa8a668079b0d1fcf6ed3a34fb652
Gerrit-Change-Number: 9004
Gerrit-PatchSet: 2
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Thu, 11 Jan 2018 19:12:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2254: Detect and warn about misusage of KuduContext

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/9004 )

Change subject: KUDU-2254: Detect and warn about misusage of KuduContext
......................................................................


Patch Set 2:

> Hmm, I thought we'd seen a case where a user had created a
 > KuduContext inside their executor somehow. Did it turn out that we
 > misdiagnosed that issue?

Right, unfortunately, I think it turned out to be a misdiagnosis. Even though the user is creating KuduContext inside foreachRDD, it is actually executed in the driver(according to spark streaming guide https://spark.apache.org/docs/latest/streaming-programming-guide.html#output-operations-on-dstreams) unlike RDD.foreachPartition etc.


-- 
To view, visit http://gerrit.cloudera.org:8080/9004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I65a7cd11a14fa8a668079b0d1fcf6ed3a34fb652
Gerrit-Change-Number: 9004
Gerrit-PatchSet: 2
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 12 Jan 2018 01:07:49 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2254: Detect and warn about misusage of KuduContext

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9004 )

Change subject: KUDU-2254: Detect and warn about misusage of KuduContext
......................................................................


Patch Set 2:

Got it. Sounds reasonable to abandon this then, if we're convinced it's not possible to make a KuduContext in an executor :)


-- 
To view, visit http://gerrit.cloudera.org:8080/9004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I65a7cd11a14fa8a668079b0d1fcf6ed3a34fb652
Gerrit-Change-Number: 9004
Gerrit-PatchSet: 2
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 12 Jan 2018 05:23:48 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2254: Detect and warn about misusage of KuduContext

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hello Dan Burkert, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9004

to look at the new patch set (#2).

Change subject: KUDU-2254: Detect and warn about misusage of KuduContext
......................................................................

KUDU-2254: Detect and warn about misusage of KuduContext

Since KuduContext carries a bunch of information for client connection,
such as authentication token, KuduContext should be created in the
driver and shared with executors. It would be great to provide a way
to warn users (e.g Logging) about such behavior since this kind of
issues are very subtle to detect.

This patch adds logic to detect and warn about this kind of misusage of
KuduContecx based on TaskContext information. A corresponding unit test
is also added.

Change-Id: I65a7cd11a14fa8a668079b0d1fcf6ed3a34fb652
---
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduContextTest.scala
2 files changed, 42 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/9004/2
-- 
To view, visit http://gerrit.cloudera.org:8080/9004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65a7cd11a14fa8a668079b0d1fcf6ed3a34fb652
Gerrit-Change-Number: 9004
Gerrit-PatchSet: 2
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] KUDU-2254: Detect and warn about misusage of KuduContext

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9004 )

Change subject: KUDU-2254: Detect and warn about misusage of KuduContext
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9004/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9004/2//COMMIT_MSG@16
PS2, Line 16: KuduContecx
nit: typo


http://gerrit.cloudera.org:8080/#/c/9004/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/9004/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@93
PS2, Line 93:   private[kudu] val taskContext = TaskContext.get
does this field now end up serialized? Given this field isn't marked @transient, and it's a serializable type, I'm guessing it would get serialized into the KuduContext.

That said, maybe it's fine because we expect KuduContext to only be serialized on the driver and deserialized on the executor, and in the driver, TaskContext is always null?



-- 
To view, visit http://gerrit.cloudera.org:8080/9004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I65a7cd11a14fa8a668079b0d1fcf6ed3a34fb652
Gerrit-Change-Number: 9004
Gerrit-PatchSet: 2
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 11 Jan 2018 23:56:35 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2254: Detect and warn about misusage of KuduContext

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/9004 )

Change subject: KUDU-2254: Detect and warn about misusage of KuduContext
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9004/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/9004/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@93
PS2, Line 93:   private[kudu] val taskContext = TaskContext.get
> does this field now end up serialized? Given this field isn't marked @trans
It is hard to be certain that TaskContext is always null in the driver but not the executor based on my knowledge. At least from what I have experiments it is this case (also according to the spark documentation).

However, I had a discussion with Dan offline, that this patch may not be necessary. Since creation of KuduContext needs SparkContext, while SparkContext cannot be serializable to executors. That means the users need to create a new SparkContext in the executor to be able to create KuduContext, and SparkContext should be a driver specific entity.



-- 
To view, visit http://gerrit.cloudera.org:8080/9004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I65a7cd11a14fa8a668079b0d1fcf6ed3a34fb652
Gerrit-Change-Number: 9004
Gerrit-PatchSet: 2
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 12 Jan 2018 00:49:36 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2254: Detect and warn about misusage of KuduContext

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/9004 )

Change subject: KUDU-2254: Detect and warn about misusage of KuduContext
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9004/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/9004/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@91
PS1, Line 91:   val Log: Logger = LoggerFactory.getLogger(classOf[KuduContext])
The logger inside the object on line 316 should be available here.  'val' fields inside the object (as opposed to the class) are Scala's version of static fields, so that's why we prefer that.


http://gerrit.cloudera.org:8080/#/c/9004/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@94
PS1, Line 94:   val taskContext = TaskContext.get
mark this private[kudu]



-- 
To view, visit http://gerrit.cloudera.org:8080/9004
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I65a7cd11a14fa8a668079b0d1fcf6ed3a34fb652
Gerrit-Change-Number: 9004
Gerrit-PatchSet: 1
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Thu, 11 Jan 2018 17:49:21 +0000
Gerrit-HasComments: Yes