You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Ha Cao <Ha...@twosigma.com> on 2023/01/18 20:29:08 UTC

[Java][Arrow 7] Issue with row filtering using Gandiva expressions

Hi everyone,

I run into an issue using Gandiva expressions to filter on a String column (see testFilterOnStringColumn method in the GandivaFilterTest.java file attached). In testFilterOnIntColumn, I construct the OR expression in the same way for an Integer column, and get correct results. But in testFilterOnStringColumn, I randomize the values in the string column "type" to be in ["A", "B", "C"] and I build an OR expression that filters for value in ["A", "B", "C"]; hence, if it works correctly, I should get back the full table and "All rows match", but it doesn't. I run the filter in a loop, it works correctly for the first iteration/batch, but then returns incorrect results (partial match or no match) for the rest of the loop (see the results.txt file attached).

Then I follow examples in this FilterTest.java<https://github.com/apache/arrow/blob/master/java/gandiva/src/test/java/org/apache/arrow/gandiva/evaluator/FilterTest.java> to build IN expression instead of OR expression, but run into segmentation fault for both Integer and String column.
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f9a5c116338, pid=702734, tid=702735
#
# JRE version: OpenJDK Runtime Environment (17.0.4.1) (build 17.0.4.1-tsjgss+0)
# Java VM: OpenJDK 64-Bit Server VM (17.0.4.1-tsjgss+0, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libgandiva_jni.so359dc1bb-45fd-48bd-97d3-18691d4fb21e+0x54c338]  gdv_fn_in_expr_lookup_int32+0x28
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

I find this ticket on Github that looks somewhat relevant to these issues https://github.com/apache/arrow/issues/20199 but there is not a lot of detail so I am not sure what the status of this ticket is and if it is actually about the issues I run into.
The Arrow version I am using is Arrow 7. For some limitation on my end, I haven't been able to use Arrow 10 to test this.
Please let me know if you know what causes this, any related tickets, workarounds etc. Any info is appreciated. Thank you so much!
Best,
Ha