You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Noemi Pap-Takacs (Jira)" <ji...@apache.org> on 2023/04/03 16:57:00 UTC
[jira] [Work started] (IMPALA-4530) Sort node after exchange should start sorting after first RowBatch is received
[ https://issues.apache.org/jira/browse/IMPALA-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on IMPALA-4530 started by Noemi Pap-Takacs.
------------------------------------------------
> Sort node after exchange should start sorting after first RowBatch is received
> ------------------------------------------------------------------------------
>
> Key: IMPALA-4530
> URL: https://issues.apache.org/jira/browse/IMPALA-4530
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Mostafa Mokhtar
> Assignee: Noemi Pap-Takacs
> Priority: Major
> Labels: performance
>
> Sort node after exchange doesn't start sorting until all data is received which add lots of latency to the query.
> Not clear if this optimization would still make sense for a Scan followed by a sort run using the same thread.
> Query
> {code}
> insert into tpcds_1000_parquet.store_sales_insert partition(ss_sold_date_sk, ss_quantity) /*+ clustered*/
> select
> ss_sold_time_sk,
> ss_item_sk ,
> ss_customer_sk,
> ss_cdemo_sk,
> ss_hdemo_sk,
> ss_addr_sk,
> ss_store_sk,
> ss_promo_sk,
> ss_ticket_number ,
> ss_wholesale_cost ,
> ss_list_price ,
> ss_sales_price ,
> ss_ext_discount_amt ,
> ss_ext_sales_price ,
> ss_ext_wholesale_cost ,
> ss_ext_list_price ,
> ss_ext_tax ,
> ss_coupon_amt ,
> ss_net_paid ,
> ss_net_paid_inc_tax ,
> ss_net_profit,
> ss_sold_date_sk , ss_quantity
> from store_sales
> {code}
> Plan
> {code}
> WRITE TO HDFS [tpcds_1000_parquet.store_sales_insert, OVERWRITE=false, PARTITION-KEYS=(ss_sold_date_sk,ss_quantity)]
> | partitions=180576
> | hosts=15 per-host-mem=17.88GB
> |
> 02:SORT
> | order by: ss_sold_date_sk DESC NULLS LAST, ss_quantity DESC NULLS LAST
> | hosts=15 per-host-mem=1.45GB
> | tuple-ids=1 row-size=100B cardinality=2879987999
> |
> 01:EXCHANGE [HASH(ss_sold_date_sk,ss_quantity)]
> | hosts=15 per-host-mem=0B
> | tuple-ids=0 row-size=100B cardinality=2879987999
> |
> 00:SCAN HDFS [tpcds_1000_parquet.store_sales, RANDOM]
> partitions=1824/1824 files=1824 size=189.24GB
> table stats: 2879987999 rows total
> column stats: all
> hosts=15 per-host-mem=88.00MB
> tuple-ids=0 row-size=100B cardinality=2879987999
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org