You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Marcin Januszkiewicz (JIRA)" <ji...@apache.org> on 2017/10/31 07:58:00 UTC
[jira] [Created] (PHOENIX-4336) SELECT + ORDER BY performance much
worse than self-join
Marcin Januszkiewicz created PHOENIX-4336:
---------------------------------------------
Summary: SELECT + ORDER BY performance much worse than self-join
Key: PHOENIX-4336
URL: https://issues.apache.org/jira/browse/PHOENIX-4336
Project: Phoenix
Issue Type: Bug
Reporter: Marcin Januszkiewicz
We have a wide table with 100M records created with the following DDL:
{code:sql}
CREATE TABLE traces (
rowkey VARCHAR PRIMARY KEY,
time VARCHAR,
number VARCHAR,
+40 more columns)
CREATE LOCAL INDEX ix_0 ON traces (UPPER(number)) INCLUDE (time, + some other columns used for filtering)
{code}
We want to select into a large (~30M records) subset of this data with the query:
{code:sql}
SELECT *all columns*
FROM traces
WHERE (UPPER(number) LIKE 'PO %')
ORDER BY time DESC, ROWKEY
LIMIT 101;
{code}
{noformat}
+-------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
+-------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
| CLIENT 234-CHUNK 39850892 ROWS 73610048115 BYTES PARALLEL 234-WAY RANGE SCAN OVER TRACES [1,'PO '] - [1,'PO!'] | 73610048115 | 39850892 | 1509102519122 |
| SERVER TOP 101 ROWS SORTED BY [cf."time" DESC, "ROWKEY"] | 73610048115 | 39850892 | 1509102519122 |
| CLIENT MERGE SORT | 73610048115 | 39850892 | 1509102519122 |
| CLIENT LIMIT 101 | 73610048115 | 39850892 | 1509102519122 |
+-------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
{noformat}
This times out after 15 minutes and puts a huge load on our cluster.
We have an alternate way of selecting this data:
{code:sql}
SELECT t.rowkey, *all columns*
FROM TRACES t
JOIN (
SELECT rowkey
FROM TRACES
WHERE (UPPER(number) LIKE 'PO %')
ORDER BY time DESC, ROWKEY
LIMIT 101
) ix
ON t.ROWKEY = ix.ROWKEY
order by t.ROWKEY;
{code}
{noformat}
+---------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+-----------+
| PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO |
+---------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+-----------+
| CLIENT 1500-CHUNK 97154640 ROWS 462422573830 BYTES PARALLEL 5-WAY FULL SCAN OVER TRACES | 73610048115 | 39850892 | 150910251 |
| CLIENT MERGE SORT | 73610048115 | 39850892 | 150910251 |
| PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE) | 73610048115 | 39850892 | 150910251 |
| CLIENT 234-CHUNK 39850892 ROWS 73610048115 BYTES PARALLEL 234-WAY RANGE SCAN OVER TRACES [1,'PO '] - [1,'PO!'] | 73610048115 | 39850892 | 150910251 |
| SERVER TOP 101 ROWS SORTED BY [cf."time" DESC, "ROWKEY"] | 73610048115 | 39850892 | 150910251 |
| CLIENT MERGE SORT | 73610048115 | 39850892 | 150910251 |
| CLIENT LIMIT 101 | 73610048115 | 39850892 | 150910251 |
| DYNAMIC SERVER FILTER BY T.ROWKEY IN (IX.ROWKEY) | 73610048115 | 39850892 | 150910251 |
+---------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+-----------+
{noformat}
Which completes in just under a minute.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)