You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Hristo Iliev (Jira)" <ji...@apache.org> on 2021/08/20 14:41:00 UTC

[jira] [Created] (HBASE-26211) [hbase-connectors] Pushdown filters in Spark do not work correctly with long types

Hristo Iliev created HBASE-26211:
------------------------------------

             Summary: [hbase-connectors] Pushdown filters in Spark do not work correctly with long types
                 Key: HBASE-26211
                 URL: https://issues.apache.org/jira/browse/HBASE-26211
             Project: HBase
          Issue Type: Bug
          Components: hbase-connectors
    Affects Versions: 1.0.0
            Reporter: Hristo Iliev


Reading from an HBase table and filtering on a LONG column does not seem to work correctly.

{{Dataset<Row> df = spark.read()
   .format("org.apache.hadoop.hbase.spark")
   .option("hbase.columns.mapping", "id STRING :key, v LONG cf:v")
   ...
   .load();
 df.filter("v > 100").show();}}

Expected behaviour is to show rows where cf:v > 100, but instead an empty dataset is shown.

Moreover, replacing {{"v > 100"}} with {{"v >= 100"}} results in a dataset where some rows have values of v less than 100. 

The problem appears to be that long values are decoded incorrectly as integers in {{NaiveEncoder.filter}}:

{{case LongEnc | TimestampEnc =>
   val in = Bytes.toInt(input, offset1)
   val value = Bytes.toInt(filterBytes, offset2 + 1)
   compare(in.compareTo(value), ops)}}

It looks like that error hasn’t been caught because {{DynamicLogicExpressionSuite}} lack test cases with long values.

The erroneous code is also present in the master branch. We have extended the test suite and implemented a quick fix and will PR on GitHub.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)