You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/12/01 12:05:01 UTC

[jira] [Commented] (PHOENIX-6232) Same query fails with IllegalArgumentException if part of a join

    [ https://issues.apache.org/jira/browse/PHOENIX-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241477#comment-17241477 ] 

ASF GitHub Bot commented on PHOENIX-6232:
-----------------------------------------

comnetwork opened a new pull request #992:
URL: https://github.com/apache/phoenix/pull/992


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Same query fails with IllegalArgumentException if part of a join
> ----------------------------------------------------------------
>
>                 Key: PHOENIX-6232
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6232
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Mate Szalay-Beko
>            Assignee: chenglei
>            Priority: Major
>         Attachments: PHOENIX-6232_v1-4.x.patch
>
>
> We were facing an interesting problem when a more complex query (with inner selects in the WHERE clause) succeeds alone, while the same query fails, if it is part of a join. I created a test table / query to reproduce the problem:
> {code:sql}
> DROP TABLE IF EXISTS test;
> CREATE TABLE test (
>       id INTEGER NOT NULL,
>       test_id INTEGER,
>       lastchanged TIMESTAMP,
>       CONSTRAINT my_pk PRIMARY KEY (id));
> UPSERT INTO test VALUES(0, 100, '2000-01-01 00:00:00.0');
> UPSERT INTO test VALUES(1, 101, '2000-01-01 00:00:00.0');
> UPSERT INTO test VALUES(2, 100, '2011-11-11 11:11:11.0');
> {code}
> *Query 1:* Example query, running fine in itself:
> {code:sql}
> SELECT id, test_id, lastchanged FROM test T
> WHERE lastchanged = ( SELECT max(lastchanged) FROM test WHERE test_id = T.test_id )
> Returns:
> +----+---------+-----------------------+
> | ID | TEST_ID |      LASTCHANGED      |
> +----+---------+-----------------------+
> | 1  | 101     | 2000-01-01 01:00:00.0 |
> | 2  | 100     | 2011-11-11 12:11:11.0 |
> +----+---------+-----------------------+
> {code}
> *Query 2:* Same query fails on the current master branch, when it is part of a larger (implicit) join:
> {code:sql}
> SELECT AAA.*
> FROM 
> (
>   SELECT id, test_id, lastchanged FROM test T
>   WHERE lastchanged = ( SELECT max(lastchanged) FROM test WHERE test_id = T.test_id )
> ) as AAA,
> (
>   SELECT id FROM test
> ) as BBB
> WHERE AAA.id = BBB.id;
> java.lang.IllegalArgumentException
> 	at org.apache.phoenix.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:128)
> 	at org.apache.phoenix.compile.TupleProjectionCompiler.createProjectedTable(TupleProjectionCompiler.java:66)
> 	at org.apache.phoenix.compile.QueryCompiler.compileSingleFlatQuery(QueryCompiler.java:663)
> 	at org.apache.phoenix.compile.QueryCompiler.compileJoinQuery(QueryCompiler.java:404)
> 	at org.apache.phoenix.compile.QueryCompiler.compileJoinQuery(QueryCompiler.java:302)
> 	at org.apache.phoenix.compile.QueryCompiler.compileSelect(QueryCompiler.java:249)
> 	at org.apache.phoenix.compile.QueryCompiler.compile(QueryCompiler.java:176)
> 	at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:504)
> 	at org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:467)
> 	at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:309)
> 	at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:298)
> 	at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
> 	at org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:297)
> 	at org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:290)
> 	at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1933)
> 	at sqlline.Commands.executeSingleQuery(Commands.java:1054)
> 	at sqlline.Commands.execute(Commands.java:1003)
> 	at sqlline.Commands.sql(Commands.java:967)
> 	at sqlline.SqlLine.dispatch(SqlLine.java:734)
> 	at sqlline.SqlLine.begin(SqlLine.java:541)
> 	at sqlline.SqlLine.start(SqlLine.java:267)
> 	at sqlline.SqlLine.main(SqlLine.java:206)
> {code}
> I am not sure what the problem is exactly. My guess is that Phoenix tries to optimize (flatten) an inner-query, which it shouldn't, if we are inside a join (according to the check in the code which throws the exception).
> The best workaround I found was to define an explicit join in the original query (Query 1), basically change the inner select into a join. This modified query return the same as the original one:
> *Query 3:*
> {code:sql}
> SELECT T.id, T.test_id, T.lastchanged 
> FROM 
>   test T 
>   LEFT JOIN (
>     SELECT max(lastchanged) AS max_timestamp, 
>            test_id AS max_timestamp_test_id
>     FROM test
>     GROUP BY test_id
>   ) JOIN_TABLE ON JOIN_TABLE.max_timestamp_test_id = T.test_id
> WHERE T.lastchanged = JOIN_TABLE.max_timestamp
> Returns:
> +------+-----------+-----------------------+
> | T.ID | T.TEST_ID |     T.LASTCHANGED     |
> +------+-----------+-----------------------+
> | 1    | 101       | 2000-01-01 01:00:00.0 |
> | 2    | 100       | 2011-11-11 12:11:11.0 |
> +------+-----------+-----------------------+
> {code}
> *Query 4:* And the same modified query (query 3) now works inside a join:
> {code:sql}
> SELECT AAA.*
> FROM 
> (
>   SELECT T.id, T.test_id, T.lastchanged 
>   FROM 
>     test T 
>     LEFT JOIN (
>       SELECT max(lastchanged) AS max_timestamp, 
>              test_id AS max_timestamp_test_id
>       FROM test
>       GROUP BY test_id
>     ) JOIN_TABLE ON JOIN_TABLE.max_timestamp_test_id = T.test_id
>   WHERE T.lastchanged = JOIN_TABLE.max_timestamp
> ) as AAA,
> (
>   SELECT id FROM test
> ) as BBB
> WHERE AAA.id = BBB.id;
> Returns:
> +------+-----------+-----------------------+
> | T.ID | T.TEST_ID |     T.LASTCHANGED     |
> +------+-----------+-----------------------+
> | 1    | 101       | 2000-01-01 01:00:00.0 |
> | 2    | 100       | 2011-11-11 12:11:11.0 |
> +------+-----------+-----------------------+
> {code}
> I think Query 4 worked, as it is forcing Phoenix to drop the idea of optimizing it's inner-query (Query 3). Although, I can be wrong about the root cause...
> Anyway, I think the bug should be fixed and Query 2 should run without exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)