You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2018/10/10 18:55:00 UTC
[jira] [Commented] (PHOENIX-4336) SELECT + ORDER BY performance
much worse than self-join
[ https://issues.apache.org/jira/browse/PHOENIX-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645413#comment-16645413 ]
Lars Hofhansl commented on PHOENIX-4336:
----------------------------------------
Just came here looking for something else.
HBase now the ROW_INDEX_V1 encoder. FAST_DIFF is not good for random GETs (done when we retrieve the remainder of the row), I'd recommend changing the encoding to ROW_INDEX_V1 and using block compression (zSTD is a great compromise between compression and speed)
> SELECT + ORDER BY performance much worse than self-join
> -------------------------------------------------------
>
> Key: PHOENIX-4336
> URL: https://issues.apache.org/jira/browse/PHOENIX-4336
> Project: Phoenix
> Issue Type: Bug
> Reporter: Marcin Januszkiewicz
> Priority: Major
>
> We have a wide table with 100M records created with the following DDL:
> {code:sql}
> CREATE TABLE traces (
> rowkey VARCHAR PRIMARY KEY,
> time VARCHAR,
> number VARCHAR,
> +40 more columns)
> CREATE LOCAL INDEX ix_0 ON traces (UPPER(number)) INCLUDE (time, + some other columns used for filtering)
> {code}
> We want to select into a large (~30M records) subset of this data with the query:
> {code:sql}
> SELECT *all columns*
> FROM traces
> WHERE (UPPER(number) LIKE 'PO %')
> ORDER BY time DESC, ROWKEY
> LIMIT 101;
> {code}
> {noformat}
> +-------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> | PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS |
> +-------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> | CLIENT 234-CHUNK 39850892 ROWS 73610048115 BYTES PARALLEL 234-WAY RANGE SCAN OVER TRACES [1,'PO '] - [1,'PO!'] | 73610048115 | 39850892 | 1509102519122 |
> | SERVER TOP 101 ROWS SORTED BY [cf."time" DESC, "ROWKEY"] | 73610048115 | 39850892 | 1509102519122 |
> | CLIENT MERGE SORT | 73610048115 | 39850892 | 1509102519122 |
> | CLIENT LIMIT 101 | 73610048115 | 39850892 | 1509102519122 |
> +-------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> {noformat}
> This times out after 15 minutes and puts a huge load on our cluster.
> We have an alternate way of selecting this data:
> {code:sql}
> SELECT t.rowkey, *all columns*
> FROM TRACES t
> JOIN (
> SELECT rowkey
> FROM TRACES
> WHERE (UPPER(number) LIKE 'PO %')
> ORDER BY time DESC, ROWKEY
> LIMIT 101
> ) ix
> ON t.ROWKEY = ix.ROWKEY
> order by t.ROWKEY;
> {code}
> {noformat}
> +---------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+-----------+
> | PLAN | EST_BYTES_READ | EST_ROWS_READ | EST_INFO |
> +---------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+-----------+
> | CLIENT 1500-CHUNK 97154640 ROWS 462422573830 BYTES PARALLEL 5-WAY FULL SCAN OVER TRACES | 73610048115 | 39850892 | 150910251 |
> | CLIENT MERGE SORT | 73610048115 | 39850892 | 150910251 |
> | PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE) | 73610048115 | 39850892 | 150910251 |
> | CLIENT 234-CHUNK 39850892 ROWS 73610048115 BYTES PARALLEL 234-WAY RANGE SCAN OVER TRACES [1,'PO '] - [1,'PO!'] | 73610048115 | 39850892 | 150910251 |
> | SERVER TOP 101 ROWS SORTED BY [cf."time" DESC, "ROWKEY"] | 73610048115 | 39850892 | 150910251 |
> | CLIENT MERGE SORT | 73610048115 | 39850892 | 150910251 |
> | CLIENT LIMIT 101 | 73610048115 | 39850892 | 150910251 |
> | DYNAMIC SERVER FILTER BY T.ROWKEY IN (IX.ROWKEY) | 73610048115 | 39850892 | 150910251 |
> +---------------------------------------------------------------------------------------------------------------------------------------------+-----------------+----------------+-----------+
> {noformat}
> Which completes in just under a minute.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)