You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "XIANG,CHAO-ZHU (Jira)" <ji...@apache.org> on 2022/10/10 16:07:00 UTC
[jira] [Updated] (PHOENIX-6808) query filter cannot get correct result
[ https://issues.apache.org/jira/browse/PHOENIX-6808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
XIANG,CHAO-ZHU updated PHOENIX-6808:
------------------------------------
Attachment: image-2022-10-11-00-06-32-190.png
> query filter cannot get correct result
> --------------------------------------
>
> Key: PHOENIX-6808
> URL: https://issues.apache.org/jira/browse/PHOENIX-6808
> Project: Phoenix
> Issue Type: Bug
> Components: 4.x, core
> Affects Versions: 4.14.1
> Environment: hbase-1.4
> phoenix-4.14.1-HBase-1.4
> Reporter: XIANG,CHAO-ZHU
> Priority: Major
> Attachments: image-2022-10-10-18-41-09-059.png, image-2022-10-10-22-41-51-879.png, image-2022-10-10-23-10-01-059.png, image-2022-10-11-00-06-32-190.png
>
>
> create table && upsert values
>
> {code:java}
> CREATE TABLE IF NOT EXISTS "TEST_TABLE" (
> "K" VARCHAR PRIMARY KEY,
> "f"."JR1" UNSIGNED_INT DEFAULT 0,
> "f"."JR2" UNSIGNED_INT DEFAULT 0,
> "f"."ZN" UNSIGNED_INT DEFAULT 0,
> "f"."AA" UNSIGNED_INT,
> "f"."AB" UNSIGNED_INT DEFAULT 0,
> "f"."AC" INTEGER DEFAULT 0,
> "f"."AD" INTEGER
> ) column_encoded_bytes=0 SPLIT ON (0, 1, 2, 3, 4, 5, 6, 7, 8, 9);
> UPSERT INTO TEST_TABLE(K, JR1, JR2, ZN, AA, AB, AC, AD) VALUES('5:xiang:346358075:370562729', 270, null, 10846, null, null, null, null);
> UPSERT INTO TEST_TABLE(K, JR1, JR2, ZN, AA, AB, AC, AD) VALUES('0:chaozhu:123123120:321321321' , 123, null, 10846, 23303, 123, 321, 111);
> {code}
> then execute query:
>
> {code:java}
> select * from "TEST_TABLE" where ( "ZN" in (10846) and ("JR1" in (2303) or "JR2" in (10846) ));{code}
> result:
> !image-2022-10-10-18-41-09-059.png!
> Above, neither `JR1` nor `JR2` matches query statement but result contains 2 rows.
> And execute query (change the order):
>
> {code:java}
> select * from "TEST_TABLE" where (("JR1" in (2303) or "JR2" in (10846) ) and "ZN" in (10846) ); {code}
> result:
>
> !image-2022-10-10-22-24-52-310.png!
> Or execute query ( "ZN" in (10846) -> "AB" in ( 123 )):
>
> {code:java}
> select * from "TEST_TABLE" where ( "AB" in (123) and ("JR1" in (2303) or "JR2" in (10846) ));
> {code}
> result:
>
> !image-2022-10-10-23-10-01-059.png!
>
> I try to debug it by docker and remote debug. Maybe I know the cause of this problem.
>
> In {*}ComparisonExpression.java{*}(org/apache/phoenix/expression/ComparisonExpression.java)
>
> {code:java}
> @Override
> public boolean evaluate(Tuple tuple, ImmutableBytesWritable ptr) {
> if (!children.get(0).evaluate(tuple, ptr)) {
> return false;
> }
> if (ptr.getLength() == 0) { // null comparison evals to null
> return true;
> }
> ....
> }{code}
> because `JR2` value is null, so `ptr.getLength() == 0` is true then return true directly.
>
> In *AndOrExpression.java* (org/apache/phoenix/expression/AndOrExpression.java)
> {code:java}
> public boolean evaluate(Tuple tuple, ImmutableBytesWritable ptr) {
> boolean isNull = false;
> for (int i = 0; i < children.size(); i++) {
> Expression child = children.get(i);
> // If partial state is available, then use that to know we've already evaluated this
> // child expression and do not need to do so again.
> if (partialEvalState == null || !partialEvalState.get(i)) {
> // Call through to child evaluate method matching parent call to allow child to optimize
> // evaluate versus getValue code path.
> if (child.evaluate(tuple, ptr)) {
> // Short circuit if we see our stop value
> if (isStopValue((Boolean) PBoolean.INSTANCE.toObject(ptr, child.getDataType()))) {
> return true;
> } else if (partialEvalState != null) {
> partialEvalState.set(i);
> }
> } else {
> isNull = true;
> }
> }
> }
> if (isNull) {
> return false;
> }
> return true;
> } {code}
>
> because evaluate `JR2` return true so `("JR1" in (2303) or "JR2" in (10846))` is true.
>
> In *BooleanExpressionFilter.Java* (org/apache/phoenix/filter/BooleanExpressionFilter.java)
>
> {code:java}
> @Override
> public String toString() {
> return expression.toString();
> } @edu.umd.cs.findbugs.annotations.SuppressWarnings(
> value="NP_BOOLEAN_RETURN_NULL",
> justification="Returns null by design.")
> protected Boolean evaluate(Tuple input) {
> try {
> if (!expression.evaluate(input, tempPtr)) {
> return null;
> }
> } catch (IllegalDataException e) {
> return Boolean.FALSE;
> }
> return (Boolean)expression.getDataType().toObject(tempPtr);
> }{code}
> now `(Boolean)expression.getDataType().toObject(tempPtr) = true` ( `"ZN" in (10846) is true` cause ) **
>
> In *MultiKeyValueComparisonFilter.java* (org/apache/phoenix/filter/MultiKeyValueComparisonFilter.java)
>
> {code:java}
> public ReturnCode filterKeyValue(Cell cell) {
> ...
> // We found a new column, so we can re-evaluate
> // TODO: if we have row key columns in our expression, should
> // we always evaluate or just wait until the end?
> this.matchedColumn = this.evaluate(inputTuple);
> if (this.matchedColumn == null) {
> if (inputTuple.isImmutable()) {
> this.matchedColumn = Boolean.FALSE;
> } else {
> return ReturnCode.INCLUDE_AND_NEXT_COL;
> }
> }
> ...
> }
> public boolean filterRow() {
> if (this.matchedColumn == null && !inputTuple.isImmutable() && expression.requiresFinalEvaluation()) {
> inputTuple.setImmutable();
> this.matchedColumn = this.evaluate(inputTuple);
> }
>
> return ! (Boolean.TRUE.equals(this.matchedColumn));
> }
> {code}
> Above, filterKeyValue -> this.matchedColumn = true -> filterRow -> return false -> not filter row
>
>
> Thus, the reason of query:
>
> {code:java}
> select * from "TEST_TABLE" where (("JR1" in (2303) or "JR2" in (10846) ) and "ZN" in (10846) ); {code}
> `("JR1" in (2303) or "JR2" in (10846)` is true and `isStopValue((Boolean) PBoolean.INSTANCE.toObject(ptr, child.getDataType())) = true` -> `(Boolean)expression.getDataType().toObject(tempPtr) = null` ->
> this.matchedColumn = false -> filterRow -> return true -> filter row
>
> And, the reason of query:
>
> {code:java}
> select * from "TEST_TABLE" where ( "AB" in (123) and ("JR1" in (2303) or "JR2" in (10846) )); {code}
> hbase read cell and filter it by alphabetical order.
> first read "AB" and `"AB" in (123)` is true then `partialEvalState.set(0)`,
> then read "JR1" and set subfilter `partialEvalState.set(0)`
> final read "JR2" -> skip index 0 since partialEvalState.get(0) -> ("JR1" in (2303) or "JR2" in (10846) ) is true and PBoolean.INSTANCE.toObject(ptr, child.getDataType())) = true` -> `(Boolean)expression.getDataType().toObject(tempPtr) = null`
> -> this.matchedColumn = false -> filterRow -> return true -> filter row
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)