You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Hristo Iliev (Jira)" <ji...@apache.org> on 2021/08/20 14:41:00 UTC
[jira] [Created] (HBASE-26211) [hbase-connectors] Pushdown filters
in Spark do not work correctly with long types
Hristo Iliev created HBASE-26211:
------------------------------------
Summary: [hbase-connectors] Pushdown filters in Spark do not work correctly with long types
Key: HBASE-26211
URL: https://issues.apache.org/jira/browse/HBASE-26211
Project: HBase
Issue Type: Bug
Components: hbase-connectors
Affects Versions: 1.0.0
Reporter: Hristo Iliev
Reading from an HBase table and filtering on a LONG column does not seem to work correctly.
{{Dataset<Row> df = spark.read()
.format("org.apache.hadoop.hbase.spark")
.option("hbase.columns.mapping", "id STRING :key, v LONG cf:v")
...
.load();
df.filter("v > 100").show();}}
Expected behaviour is to show rows where cf:v > 100, but instead an empty dataset is shown.
Moreover, replacing {{"v > 100"}} with {{"v >= 100"}} results in a dataset where some rows have values of v less than 100.
The problem appears to be that long values are decoded incorrectly as integers in {{NaiveEncoder.filter}}:
{{case LongEnc | TimestampEnc =>
val in = Bytes.toInt(input, offset1)
val value = Bytes.toInt(filterBytes, offset2 + 1)
compare(in.compareTo(value), ops)}}
It looks like that error hasn’t been caught because {{DynamicLogicExpressionSuite}} lack test cases with long values.
The erroneous code is also present in the master branch. We have extended the test suite and implemented a quick fix and will PR on GitHub.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)