You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Philip Zeyliger (Code Review)" <ge...@cloudera.org> on 2019/03/05 01:26:20 UTC

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12660


Change subject: IMPALA-8250: Clean up JNI warnings.
......................................................................

IMPALA-8250: Clean up JNI warnings.

Using HDFS_OPTS+="-Xcheck:jni" revealed a handful of warnings related to
(a) checking for exceptions and (b) leaking local references.

Checking for exceptions required sprinkling RETURN_ERROR_IF_EXC
left and right. I chose not to expand the JniCall infrastructure
to handle this more generally at the moment.

The leaky local references are a bit harder. In the logs, they show up
as "WARNING: JNI local refs: 2597, exceeds capacity: 35" or similar. A
few of these errors seem to be not in our code.  The ones that I've
found in our code stemmed from HBaseTableScanner::GetRowKey(): this
method uses local references and wasn't returning them. Using a
JniLocalFrame seems to have taken care of the warnings.

I have added code to skip test_large_strings when JNI checking is
enabled. This test takes forever (presumably because JNI is checking
bounds on strings very aggressively), and times out. The time out also
causes some metric-related checks to fail (since a query is still in
flight).

Debugging this required customizing my JDK to give stack traces
when these warnings occurred. The following diff facilitated
this.

  diff -r 76a9c9cf14f1 src/share/vm/prims/jniCheck.cpp
  --- a/src/share/vm/prims/jniCheck.cpp	Tue Jan 15 10:43:31 2019 +0000
  +++ b/src/share/vm/prims/jniCheck.cpp	Wed Feb 27 11:57:13 2019 -0800
  @@ -143,11 +143,30 @@
   static const char * fatal_instance_field_mismatch = "Field type (instance) mismatch in JNI get/set field operations";
   static const char * fatal_non_string = "JNI string operation received a non-string";

  +// thisone: whether to print every time, or maybe, depending on future
  +// how many future stacks we want printed (totally racy); helps catch
  +// missing exception handling if there's a way to tickle that code
  +// reliably.
  +static inline void dump_native_stack(JavaThread* thr, bool thisone, int future) {
  +  static int fut_stacks = 0; // racy!
  +  if (fut_stacks > 0) {
  +    thisone = true;
  +    fut_stacks--;
  +  }
  +  if (future > 0) fut_stacks = future;
  +  if (thisone) {
  +    frame fr = os::current_frame();
  +    char buf[6000];
  +    tty->print_cr("Thread: %s %d", thr->get_thread_name(), thr->osthread()->thread_id());
  +    print_native_stack(tty, fr, thr, buf, sizeof(buf));
  +  }
  +}

   // When in VM state:
   static void ReportJNIWarning(JavaThread* thr, const char *msg) {
     tty->print_cr("WARNING in native method: %s", msg);
     thr->print_stack();
  +  dump_native_stack(thr, true, 0);
   }

   // When in NATIVE state:
  @@ -199,11 +218,14 @@
         tty->print_cr("WARNING in native method: JNI call made without checking exceptions when required to from %s",
           thr->get_pending_jni_exception_check());
         thr->print_stack();
  +      dump_native_stack(thr, true, 10);
       )
       thr->clear_pending_jni_exception_check(); // Just complain once
     }
   }

  +
  +
   /**
    * Add to the planned number of handles. I.e. plus current live & warning threshold
    */
  @@ -254,9 +276,12 @@
         tty->print_cr("WARNING: JNI local refs: %zu, exceeds capacity: %zu",
             live_handles, planned_capacity);
         thr->print_stack();
  +      dump_native_stack(thr, true, 0);
       )
       // Complain just the once, reset to current + warn threshold
       add_planned_handle_capacity(handles, 0);
  +  } else {
  +    dump_native_stack(thr, false, 0);
     }
   }

Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
---
M be/src/catalog/catalog.cc
M be/src/exec/hbase-table-scanner.cc
M be/src/exprs/hive-udf-call.cc
M be/src/runtime/hbase-table-factory.cc
M be/src/service/frontend.cc
M tests/query_test/test_insert.py
6 files changed, 43 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/12660/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12660 )

Change subject: IMPALA-8250: Clean up JNI warnings.
......................................................................


Patch Set 4: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 08 Mar 2019 03:35:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12660 )

Change subject: IMPALA-8250: Clean up JNI warnings.
......................................................................


Patch Set 3:

> "I've kicked off a suite of our tests with code coverage on. I'm
 > also going to kick off a set of tests with the JNI checking and
 > 'exhaustive' tests. Maybe there's coverage, but we're not running
 > the queries against HBase."
 > 
 > Did anything new come up in this?
 > 

Nothing new. I'm still chasing a sequence of these calls which lead to CallObjectMethod() not having its exception checked...

checked_jni_CallVoidMethodV
checked_jni_ExceptionOccurred
checked_jni_ExceptionClear
checked_jni_GetObjectClass
checked_jni_FindClass
checked_jni_GetMethodID
checked_jni_CallObjectMethod

I can trigger this with

set mem_limit=50m;use functional;insert into table alltypesinsert
partition (year, month) /* +noclustered */
select at1.id, at1.bool_col, at1.tinyint_col, at1.smallint_col, at1.int_col, at1.bigint_col,
  at1.float_col, at1.double_col, at1.date_string_col, at1.string_col, at1.timestamp_col,
  at1.year, at2.id as month
from  functional.alltypes at1, functional.alltypes at2;

when the sink gets closed after a failure.

I don't know where this stack of calls is at this point. I'll keep looking, but I don't want to block on it.


-- 
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 07 Mar 2019 23:08:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12660 )

Change subject: IMPALA-8250: Clean up JNI warnings.
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2348/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 05 Mar 2019 02:07:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12660 )

Change subject: IMPALA-8250: Clean up JNI warnings.
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3891/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 07 Mar 2019 23:09:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/12660 )

Change subject: IMPALA-8250: Clean up JNI warnings.
......................................................................


Patch Set 4: Code-Review+2

Thanks!


-- 
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 07 Mar 2019 23:13:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/12660 )

Change subject: IMPALA-8250: Clean up JNI warnings.
......................................................................


Patch Set 3: Code-Review+1

"I've kicked off a suite of our tests with code coverage on. I'm also going to kick off a set of tests with the JNI checking and 'exhaustive' tests. Maybe there's coverage, but we're not running the queries against HBase."

Did anything new come up in this?

Patch LGTM.


-- 
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 07 Mar 2019 18:47:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Hello Sahil Takiar, Todd Lipcon, Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12660

to look at the new patch set (#3).

Change subject: IMPALA-8250: Clean up JNI warnings.
......................................................................

IMPALA-8250: Clean up JNI warnings.

Using LIBHDFS_OPTS+="-Xcheck:jni" revealed a handful of warnings related to
(a) checking for exceptions and (b) leaking local references.

Checking for exceptions required sprinkling RETURN_ERROR_IF_EXC
left and right. I chose not to expand the JniCall infrastructure
to handle this more generally at the moment.

The leaky local references are a bit harder. In the logs, they show up
as "WARNING: JNI local refs: 2597, exceeds capacity: 35" or similar. A
few of these errors seem to be not in our code.  The ones that I've
found in our code stemmed from HBaseTableScanner::GetRowKey(): this
method uses local references and wasn't returning them. Using a
JniLocalFrame seems to have taken care of the warnings.

I have added code to skip test_large_strings when JNI checking is
enabled. This test takes forever (presumably because JNI is checking
bounds on strings very aggressively), and times out. The time out also
causes some metric-related checks to fail (since a query is still in
flight).

Debugging this required customizing my JDK to give stack traces
when these warnings occurred. The following diff facilitated
this.

  diff -r 76a9c9cf14f1 src/share/vm/prims/jniCheck.cpp
  --- a/src/share/vm/prims/jniCheck.cpp	Tue Jan 15 10:43:31 2019 +0000
  +++ b/src/share/vm/prims/jniCheck.cpp	Wed Feb 27 11:57:13 2019 -0800
  @@ -143,11 +143,30 @@
   static const char * fatal_instance_field_mismatch = "Field type (instance) mismatch in JNI get/set field operations";
   static const char * fatal_non_string = "JNI string operation received a non-string";

  +// thisone: whether to print every time, or maybe, depending on future
  +// how many future stacks we want printed (totally racy); helps catch
  +// missing exception handling if there's a way to tickle that code
  +// reliably.
  +static inline void dump_native_stack(JavaThread* thr, bool thisone, int future) {
  +  static int fut_stacks = 0; // racy!
  +  if (fut_stacks > 0) {
  +    thisone = true;
  +    fut_stacks--;
  +  }
  +  if (future > 0) fut_stacks = future;
  +  if (thisone) {
  +    frame fr = os::current_frame();
  +    char buf[6000];
  +    tty->print_cr("Thread: %s %d", thr->get_thread_name(), thr->osthread()->thread_id());
  +    print_native_stack(tty, fr, thr, buf, sizeof(buf));
  +  }
  +}

   // When in VM state:
   static void ReportJNIWarning(JavaThread* thr, const char *msg) {
     tty->print_cr("WARNING in native method: %s", msg);
     thr->print_stack();
  +  dump_native_stack(thr, true, 0);
   }

   // When in NATIVE state:
  @@ -199,11 +218,14 @@
         tty->print_cr("WARNING in native method: JNI call made without checking exceptions when required to from %s",
           thr->get_pending_jni_exception_check());
         thr->print_stack();
  +      dump_native_stack(thr, true, 10);
       )
       thr->clear_pending_jni_exception_check(); // Just complain once
     }
   }

  +
  +
   /**
    * Add to the planned number of handles. I.e. plus current live & warning threshold
    */
  @@ -254,9 +276,12 @@
         tty->print_cr("WARNING: JNI local refs: %zu, exceeds capacity: %zu",
             live_handles, planned_capacity);
         thr->print_stack();
  +      dump_native_stack(thr, true, 0);
       )
       // Complain just the once, reset to current + warn threshold
       add_planned_handle_capacity(handles, 0);
  +  } else {
  +    dump_native_stack(thr, false, 0);
     }
   }

Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
---
M be/src/catalog/catalog.cc
M be/src/exec/hbase-table-scanner.cc
M be/src/exprs/hive-udf-call.cc
M be/src/runtime/hbase-table-factory.cc
M be/src/service/frontend.cc
M be/src/service/frontend.h
M tests/query_test/test_insert.py
7 files changed, 55 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/12660/3
-- 
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12660 )

Change subject: IMPALA-8250: Clean up JNI warnings.
......................................................................


Patch Set 1:

(3 comments)

> (3 comments)
 > 
 > I'm seeing a lot of places in the HBase scanner where we call
 > GetByteArrayRegion or GetObjectArrayElement without checking for
 > exceptions (these can both throw ArrayIndexOutOfBoundsExceptions).
 > I'm not sure why -Xcheck:jni wouldn't detect these, but perhaps be
 > should do a scrub of the rest of the Impala code to make sure we
 > are checking for exceptions after each array call (there are
 > several other array-based JNI ops that throw ArrayIndexOutOfBoundsException
 > - https://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/functions.html).
 > We can do that in a different patch though since it might be a
 > larger change.

Yeah: this is surprising. I read the jniCheck.cpp code and it should be handling the GetObjectArray... methods. The most likely scenario is that our tests don't have coverage here. Or jniCheck isn't doing its business.

I'm going to sprinkle in some of these where it's obvious.

I've kicked off a suite of our tests with code coverage on. I'm also going to kick off a set of tests with the JNI checking and 'exhaustive' tests. Maybe there's coverage, but we're not running the queries against HBase.


This discussion (from jniCheck.cpp) is also interesting. I didn't see the code that didn't warn about ArrayIndexOutOfBounds, but maybe that's what's happening.


/**
 * Check whether or not a programmer has actually checked for exceptions. According
 * to the JNI Specification ("jni/spec/design.html#java_exceptions"):
 *
 * There are two cases where the programmer needs to check for exceptions without
 * being able to first check an error code:
 *
 * - The JNI functions that invoke a Java method return the result of the Java method.
 * The programmer must call ExceptionOccurred() to check for possible exceptions
 * that occurred during the execution of the Java method.
 *
 * - Some of the JNI array access functions do not return an error code, but may
 * throw an ArrayIndexOutOfBoundsException or ArrayStoreException.
 *
 * In all other cases, a non-error return value guarantees that no exceptions have been thrown.
 *
 * Programmers often defend against ArrayIndexOutOfBoundsException, so warning
 * for these functions would be pedantic.
 */

http://gerrit.cloudera.org:8080/#/c/12660/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12660/1//COMMIT_MSG@9
PS1, Line 9: Using HDFS_OPTS+="-Xcheck:jni" revealed a handful of warnings related to
> It's LIBHDFS_OPTS right?
Done


http://gerrit.cloudera.org:8080/#/c/12660/1/be/src/runtime/hbase-table-factory.cc
File be/src/runtime/hbase-table-factory.cc:

http://gerrit.cloudera.org:8080/#/c/12660/1/be/src/runtime/hbase-table-factory.cc@97
PS1, Line 97:     if (!s.ok()) LOG(INFO) << "Exception when cleaning up HBase" << s;
> nit: space after HBase
Done


http://gerrit.cloudera.org:8080/#/c/12660/1/be/src/service/frontend.cc
File be/src/service/frontend.cc:

http://gerrit.cloudera.org:8080/#/c/12660/1/be/src/service/frontend.cc@116
PS1, Line 116:     ABORT_IF_ERROR(JniUtil::LoadJniMethod(jni_env, fe_class_, &(methods[i])));
> Might be understanding the code wrong, but shouldn't these be global refs s
Based on https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#accessing_fields_and_methods , https://stackoverflow.com/questions/2093112/why-i-should-not-reuse-a-jclass-and-or-jmethodid-in-jni and http://www.latkin.org/blog/2016/02/01/jni-object-lifetimes-quick-reference/ (very nice!) and https://developer.android.com/training/articles/perf-jni.html#local_and_global_references say that jmethodid (which is the type here) doesn't need global references, but is invalidated if the class gets unloaded. Fortunately, the class is kept alive because fe_ stores it.

I thought fe_class_ had the problem you suggest (it's a local reference), but it turns out that it's not used outside this method. I removed it from the class to make that obvious.



-- 
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 05 Mar 2019 20:34:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/12660 )

Change subject: IMPALA-8250: Clean up JNI warnings.
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12660/2/be/src/exec/hbase-table-scanner.cc
File be/src/exec/hbase-table-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/12660/2/be/src/exec/hbase-table-scanner.cc@562
PS2, Line 562:   RETURN_ERROR_IF_EXC(env);
I think GetArrayLength does not throw an exception, so this check is probably unnecessary.



-- 
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 05 Mar 2019 22:46:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Hello Sahil Takiar, Todd Lipcon, Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12660

to look at the new patch set (#2).

Change subject: IMPALA-8250: Clean up JNI warnings.
......................................................................

IMPALA-8250: Clean up JNI warnings.

Using LIBHDFS_OPTS+="-Xcheck:jni" revealed a handful of warnings related to
(a) checking for exceptions and (b) leaking local references.

Checking for exceptions required sprinkling RETURN_ERROR_IF_EXC
left and right. I chose not to expand the JniCall infrastructure
to handle this more generally at the moment.

The leaky local references are a bit harder. In the logs, they show up
as "WARNING: JNI local refs: 2597, exceeds capacity: 35" or similar. A
few of these errors seem to be not in our code.  The ones that I've
found in our code stemmed from HBaseTableScanner::GetRowKey(): this
method uses local references and wasn't returning them. Using a
JniLocalFrame seems to have taken care of the warnings.

I have added code to skip test_large_strings when JNI checking is
enabled. This test takes forever (presumably because JNI is checking
bounds on strings very aggressively), and times out. The time out also
causes some metric-related checks to fail (since a query is still in
flight).

Debugging this required customizing my JDK to give stack traces
when these warnings occurred. The following diff facilitated
this.

  diff -r 76a9c9cf14f1 src/share/vm/prims/jniCheck.cpp
  --- a/src/share/vm/prims/jniCheck.cpp	Tue Jan 15 10:43:31 2019 +0000
  +++ b/src/share/vm/prims/jniCheck.cpp	Wed Feb 27 11:57:13 2019 -0800
  @@ -143,11 +143,30 @@
   static const char * fatal_instance_field_mismatch = "Field type (instance) mismatch in JNI get/set field operations";
   static const char * fatal_non_string = "JNI string operation received a non-string";

  +// thisone: whether to print every time, or maybe, depending on future
  +// how many future stacks we want printed (totally racy); helps catch
  +// missing exception handling if there's a way to tickle that code
  +// reliably.
  +static inline void dump_native_stack(JavaThread* thr, bool thisone, int future) {
  +  static int fut_stacks = 0; // racy!
  +  if (fut_stacks > 0) {
  +    thisone = true;
  +    fut_stacks--;
  +  }
  +  if (future > 0) fut_stacks = future;
  +  if (thisone) {
  +    frame fr = os::current_frame();
  +    char buf[6000];
  +    tty->print_cr("Thread: %s %d", thr->get_thread_name(), thr->osthread()->thread_id());
  +    print_native_stack(tty, fr, thr, buf, sizeof(buf));
  +  }
  +}

   // When in VM state:
   static void ReportJNIWarning(JavaThread* thr, const char *msg) {
     tty->print_cr("WARNING in native method: %s", msg);
     thr->print_stack();
  +  dump_native_stack(thr, true, 0);
   }

   // When in NATIVE state:
  @@ -199,11 +218,14 @@
         tty->print_cr("WARNING in native method: JNI call made without checking exceptions when required to from %s",
           thr->get_pending_jni_exception_check());
         thr->print_stack();
  +      dump_native_stack(thr, true, 10);
       )
       thr->clear_pending_jni_exception_check(); // Just complain once
     }
   }

  +
  +
   /**
    * Add to the planned number of handles. I.e. plus current live & warning threshold
    */
  @@ -254,9 +276,12 @@
         tty->print_cr("WARNING: JNI local refs: %zu, exceeds capacity: %zu",
             live_handles, planned_capacity);
         thr->print_stack();
  +      dump_native_stack(thr, true, 0);
       )
       // Complain just the once, reset to current + warn threshold
       add_planned_handle_capacity(handles, 0);
  +  } else {
  +    dump_native_stack(thr, false, 0);
     }
   }

Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
---
M be/src/catalog/catalog.cc
M be/src/exec/hbase-table-scanner.cc
M be/src/exprs/hive-udf-call.cc
M be/src/runtime/hbase-table-factory.cc
M be/src/service/frontend.cc
M be/src/service/frontend.h
M tests/query_test/test_insert.py
7 files changed, 56 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/12660/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12660 )

Change subject: IMPALA-8250: Clean up JNI warnings.
......................................................................

IMPALA-8250: Clean up JNI warnings.

Using LIBHDFS_OPTS+="-Xcheck:jni" revealed a handful of warnings related to
(a) checking for exceptions and (b) leaking local references.

Checking for exceptions required sprinkling RETURN_ERROR_IF_EXC
left and right. I chose not to expand the JniCall infrastructure
to handle this more generally at the moment.

The leaky local references are a bit harder. In the logs, they show up
as "WARNING: JNI local refs: 2597, exceeds capacity: 35" or similar. A
few of these errors seem to be not in our code.  The ones that I've
found in our code stemmed from HBaseTableScanner::GetRowKey(): this
method uses local references and wasn't returning them. Using a
JniLocalFrame seems to have taken care of the warnings.

I have added code to skip test_large_strings when JNI checking is
enabled. This test takes forever (presumably because JNI is checking
bounds on strings very aggressively), and times out. The time out also
causes some metric-related checks to fail (since a query is still in
flight).

Debugging this required customizing my JDK to give stack traces
when these warnings occurred. The following diff facilitated
this.

  diff -r 76a9c9cf14f1 src/share/vm/prims/jniCheck.cpp
  --- a/src/share/vm/prims/jniCheck.cpp	Tue Jan 15 10:43:31 2019 +0000
  +++ b/src/share/vm/prims/jniCheck.cpp	Wed Feb 27 11:57:13 2019 -0800
  @@ -143,11 +143,30 @@
   static const char * fatal_instance_field_mismatch = "Field type (instance) mismatch in JNI get/set field operations";
   static const char * fatal_non_string = "JNI string operation received a non-string";

  +// thisone: whether to print every time, or maybe, depending on future
  +// how many future stacks we want printed (totally racy); helps catch
  +// missing exception handling if there's a way to tickle that code
  +// reliably.
  +static inline void dump_native_stack(JavaThread* thr, bool thisone, int future) {
  +  static int fut_stacks = 0; // racy!
  +  if (fut_stacks > 0) {
  +    thisone = true;
  +    fut_stacks--;
  +  }
  +  if (future > 0) fut_stacks = future;
  +  if (thisone) {
  +    frame fr = os::current_frame();
  +    char buf[6000];
  +    tty->print_cr("Thread: %s %d", thr->get_thread_name(), thr->osthread()->thread_id());
  +    print_native_stack(tty, fr, thr, buf, sizeof(buf));
  +  }
  +}

   // When in VM state:
   static void ReportJNIWarning(JavaThread* thr, const char *msg) {
     tty->print_cr("WARNING in native method: %s", msg);
     thr->print_stack();
  +  dump_native_stack(thr, true, 0);
   }

   // When in NATIVE state:
  @@ -199,11 +218,14 @@
         tty->print_cr("WARNING in native method: JNI call made without checking exceptions when required to from %s",
           thr->get_pending_jni_exception_check());
         thr->print_stack();
  +      dump_native_stack(thr, true, 10);
       )
       thr->clear_pending_jni_exception_check(); // Just complain once
     }
   }

  +
  +
   /**
    * Add to the planned number of handles. I.e. plus current live & warning threshold
    */
  @@ -254,9 +276,12 @@
         tty->print_cr("WARNING: JNI local refs: %zu, exceeds capacity: %zu",
             live_handles, planned_capacity);
         thr->print_stack();
  +      dump_native_stack(thr, true, 0);
       )
       // Complain just the once, reset to current + warn threshold
       add_planned_handle_capacity(handles, 0);
  +  } else {
  +    dump_native_stack(thr, false, 0);
     }
   }

Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Reviewed-on: http://gerrit.cloudera.org:8080/12660
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/catalog/catalog.cc
M be/src/exec/hbase-table-scanner.cc
M be/src/exprs/hive-udf-call.cc
M be/src/runtime/hbase-table-factory.cc
M be/src/service/frontend.cc
M be/src/service/frontend.h
M tests/query_test/test_insert.py
7 files changed, 55 insertions(+), 10 deletions(-)

Approvals:
  Joe McDonnell: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

Posted by "Sahil Takiar (Code Review)" <ge...@cloudera.org>.
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/12660 )

Change subject: IMPALA-8250: Clean up JNI warnings.
......................................................................


Patch Set 1:

(3 comments)

I'm seeing a lot of places in the HBase scanner where we call GetByteArrayRegion or GetObjectArrayElement without checking for exceptions (these can both throw ArrayIndexOutOfBoundsExceptions). I'm not sure why -Xcheck:jni wouldn't detect these, but perhaps be should do a scrub of the rest of the Impala code to make sure we are checking for exceptions after each array call (there are several other array-based JNI ops that throw ArrayIndexOutOfBoundsException - https://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/functions.html). We can do that in a different patch though since it might be a larger change.

http://gerrit.cloudera.org:8080/#/c/12660/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12660/1//COMMIT_MSG@9
PS1, Line 9: Using HDFS_OPTS+="-Xcheck:jni" revealed a handful of warnings related to
It's LIBHDFS_OPTS right?


http://gerrit.cloudera.org:8080/#/c/12660/1/be/src/runtime/hbase-table-factory.cc
File be/src/runtime/hbase-table-factory.cc:

http://gerrit.cloudera.org:8080/#/c/12660/1/be/src/runtime/hbase-table-factory.cc@97
PS1, Line 97:     if (!s.ok()) LOG(INFO) << "Exception when cleaning up HBase" << s;
nit: space after HBase


http://gerrit.cloudera.org:8080/#/c/12660/1/be/src/service/frontend.cc
File be/src/service/frontend.cc:

http://gerrit.cloudera.org:8080/#/c/12660/1/be/src/service/frontend.cc@116
PS1, Line 116:     ABORT_IF_ERROR(JniUtil::LoadJniMethod(jni_env, fe_class_, &(methods[i])));
Might be understanding the code wrong, but shouldn't these be global refs since they are used outside the scope of this method?



-- 
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 05 Mar 2019 17:53:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12660 )

Change subject: IMPALA-8250: Clean up JNI warnings.
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2375/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 06 Mar 2019 18:59:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12660 )

Change subject: IMPALA-8250: Clean up JNI warnings.
......................................................................


Patch Set 2:

(1 comment)

Yep, good catch. Thanks.

http://gerrit.cloudera.org:8080/#/c/12660/2/be/src/exec/hbase-table-scanner.cc
File be/src/exec/hbase-table-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/12660/2/be/src/exec/hbase-table-scanner.cc@562
PS2, Line 562:   RETURN_ERROR_IF_EXC(env);
> I think GetArrayLength does not throw an exception, so this check is probab
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 06 Mar 2019 18:13:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12660 )

Change subject: IMPALA-8250: Clean up JNI warnings.
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2359/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <st...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 05 Mar 2019 21:19:45 +0000
Gerrit-HasComments: No