You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "XIANG,CHAO-ZHU (Jira)" <ji...@apache.org> on 2022/10/10 16:07:00 UTC
[jira] [Updated] (PHOENIX-6808) query filter cannot get correct result

     [ https://issues.apache.org/jira/browse/PHOENIX-6808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

XIANG,CHAO-ZHU updated PHOENIX-6808:
------------------------------------
    Attachment: image-2022-10-11-00-06-32-190.png

> query filter cannot get correct result
> --------------------------------------
>
>                 Key: PHOENIX-6808
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6808
>             Project: Phoenix
>          Issue Type: Bug
>          Components: 4.x, core
>    Affects Versions: 4.14.1
>         Environment: hbase-1.4
> phoenix-4.14.1-HBase-1.4
>            Reporter: XIANG,CHAO-ZHU
>            Priority: Major
>         Attachments: image-2022-10-10-18-41-09-059.png, image-2022-10-10-22-41-51-879.png, image-2022-10-10-23-10-01-059.png, image-2022-10-11-00-06-32-190.png
>
>
> create table && upsert values
>  
> {code:java}
> CREATE TABLE IF NOT EXISTS "TEST_TABLE" (
> "K" VARCHAR PRIMARY KEY,
> "f"."JR1" UNSIGNED_INT DEFAULT 0,
> "f"."JR2" UNSIGNED_INT DEFAULT 0,
> "f"."ZN" UNSIGNED_INT DEFAULT 0,
> "f"."AA" UNSIGNED_INT,
> "f"."AB" UNSIGNED_INT DEFAULT 0,
> "f"."AC" INTEGER DEFAULT 0,
> "f"."AD" INTEGER
> ) column_encoded_bytes=0 SPLIT ON (0, 1, 2, 3, 4, 5, 6, 7, 8, 9);
> UPSERT INTO TEST_TABLE(K, JR1, JR2, ZN, AA, AB, AC, AD) VALUES('5:xiang:346358075:370562729', 270, null, 10846, null, null, null, null);
> UPSERT INTO TEST_TABLE(K, JR1, JR2, ZN, AA, AB, AC, AD) VALUES('0:chaozhu:123123120:321321321' ,  123, null,  10846, 23303, 123, 321, 111);
>  {code}
> then execute query:
>  
> {code:java}
> select * from "TEST_TABLE" where ( "ZN" in (10846) and ("JR1" in (2303) or "JR2" in (10846) ));{code}
> result:
> !image-2022-10-10-18-41-09-059.png!
> Above,  neither `JR1`  nor `JR2` matches query statement but result contains 2 rows.
> And execute query (change the order): 
>  
> {code:java}
> select * from "TEST_TABLE" where (("JR1" in (2303) or "JR2" in (10846) ) and  "ZN" in (10846) ); {code}
> result:
>  
> !image-2022-10-10-22-24-52-310.png!
> Or execute query ( "ZN" in (10846) -> "AB" in ( 123 )):
>  
> {code:java}
> select * from "TEST_TABLE" where ( "AB" in (123) and ("JR1" in (2303) or "JR2" in (10846) ));
> {code}
> result:
>  
> !image-2022-10-10-23-10-01-059.png!
>  
> I try to debug it by docker and remote debug. Maybe I know the cause of this problem.
>  
> In {*}ComparisonExpression.java{*}(org/apache/phoenix/expression/ComparisonExpression.java)
>  
> {code:java}
> @Override
> public boolean evaluate(Tuple tuple, ImmutableBytesWritable ptr) {
>     if (!children.get(0).evaluate(tuple, ptr)) {
>         return false;
>     }
>     if (ptr.getLength() == 0) { // null comparison evals to null
>        return true;
>     } 
> ....
> }{code}
> because `JR2` value is null, so `ptr.getLength() == 0` is true then return true directly.
>  
> In *AndOrExpression.java* (org/apache/phoenix/expression/AndOrExpression.java)
> {code:java}
> public boolean evaluate(Tuple tuple, ImmutableBytesWritable ptr) {
>     boolean isNull = false;
>     for (int i = 0; i < children.size(); i++) {
>         Expression child = children.get(i);
>         // If partial state is available, then use that to know we've already evaluated this
>         // child expression and do not need to do so again.
>         if (partialEvalState == null || !partialEvalState.get(i)) {
>             // Call through to child evaluate method matching parent call to allow child to optimize
>             // evaluate versus getValue code path.
>             if (child.evaluate(tuple, ptr)) {
>                 // Short circuit if we see our stop value
>                 if (isStopValue((Boolean) PBoolean.INSTANCE.toObject(ptr, child.getDataType()))) {
>                     return true;
>                 } else if (partialEvalState != null) {
>                     partialEvalState.set(i);
>                 }
>             } else {
>                 isNull = true;
>             }
>         }
>     }
>     if (isNull) {
>         return false;
>     }
>     return true;
> } {code}
>  
> because evaluate `JR2` return true so `("JR1" in (2303) or "JR2" in (10846))` is true.
>  
> In *BooleanExpressionFilter.Java*   (org/apache/phoenix/filter/BooleanExpressionFilter.java)
>  
> {code:java}
>     @Override
>     public String toString() {
>         return expression.toString();
>     }    @edu.umd.cs.findbugs.annotations.SuppressWarnings(
>             value="NP_BOOLEAN_RETURN_NULL",
>             justification="Returns null by design.")
>     protected Boolean evaluate(Tuple input) {
>         try {
>             if (!expression.evaluate(input, tempPtr)) {
>                 return null;
>             }
>         } catch (IllegalDataException e) {
>             return Boolean.FALSE;
>         }
>         return (Boolean)expression.getDataType().toObject(tempPtr);
>     }{code}
> now  `(Boolean)expression.getDataType().toObject(tempPtr) = true` ( `"ZN" in (10846) is true` cause ) ** 
>  
> In *MultiKeyValueComparisonFilter.java* (org/apache/phoenix/filter/MultiKeyValueComparisonFilter.java)
>  
> {code:java}
> public ReturnCode filterKeyValue(Cell cell) {
>     ...
>     // We found a new column, so we can re-evaluate
>     // TODO: if we have row key columns in our expression, should
>     // we always evaluate or just wait until the end?
>     this.matchedColumn = this.evaluate(inputTuple);
>     if (this.matchedColumn == null) {
>         if (inputTuple.isImmutable()) {
>             this.matchedColumn = Boolean.FALSE;
>         } else {
>             return ReturnCode.INCLUDE_AND_NEXT_COL;
>         }
>     } 
> ...
> }
> public boolean filterRow() {
>     if (this.matchedColumn == null && !inputTuple.isImmutable() && expression.requiresFinalEvaluation()) {
>         inputTuple.setImmutable();
>         this.matchedColumn = this.evaluate(inputTuple);
>     }
>     
>     return ! (Boolean.TRUE.equals(this.matchedColumn));
> }
> {code}
> Above, filterKeyValue -> this.matchedColumn = true -> filterRow -> return false -> not filter row
>  
>  
> Thus, the reason of query:
>  
> {code:java}
> select * from "TEST_TABLE" where (("JR1" in (2303) or "JR2" in (10846) ) and  "ZN" in (10846) );  {code}
> `("JR1" in (2303) or "JR2" in (10846)` is true and `isStopValue((Boolean) PBoolean.INSTANCE.toObject(ptr, child.getDataType())) = true` -> `(Boolean)expression.getDataType().toObject(tempPtr) = null` ->  
>  this.matchedColumn = false -> filterRow -> return true -> filter row
>  
> And, the reason of query:
>  
> {code:java}
> select * from "TEST_TABLE" where ( "AB" in (123) and ("JR1" in (2303) or "JR2" in (10846) )); {code}
> hbase read cell and filter it by alphabetical order.  
> first read "AB" and `"AB" in (123)` is true then `partialEvalState.set(0)`,
> then read "JR1" and set subfilter `partialEvalState.set(0)`
> final read "JR2" -> skip index 0 since partialEvalState.get(0)  ->  ("JR1" in (2303) or "JR2" in (10846) ) is true and PBoolean.INSTANCE.toObject(ptr, child.getDataType())) = true` -> `(Boolean)expression.getDataType().toObject(tempPtr) = null`
> ->   this.matchedColumn = false -> filterRow -> return true -> filter row
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)