You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Ed Espino (JIRA)" <ji...@apache.org> on 2016/12/22 06:28:58 UTC
[jira] [Resolved] (HAWQ-1183) Writable external table with Hash
distribution shows slow performance
[ https://issues.apache.org/jira/browse/HAWQ-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ed Espino resolved HAWQ-1183.
-----------------------------
Resolution: Fixed
> Writable external table with Hash distribution shows slow performance
> ---------------------------------------------------------------------
>
> Key: HAWQ-1183
> URL: https://issues.apache.org/jira/browse/HAWQ-1183
> Project: Apache HAWQ
> Issue Type: Bug
> Reporter: Paul Guo
> Assignee: Paul Guo
> Fix For: 2.1.0.0-incubating
>
>
> Steps:
> 1. Create tables and populate them.
> drop table tbl1;
> drop external table ext_tbl1;
> drop external table ext_tbl1_random;
> CREATE TABLE tbl1 (a int, b text) DISTRIBUTED BY (a);
> INSERT INTO tbl1 VALUES (generate_series(1,1000),'aaa');
> INSERT INTO tbl1 VALUES (generate_series(1,10000),'bbb');
> INSERT INTO tbl1 VALUES (generate_series(1,100000),'bbc');
> INSERT INTO tbl1 VALUES (generate_series(1,1000000),'bdbc');
> INSERT INTO tbl1 VALUES (generate_series(1,1000000),'bdddbc');
> CREATE WRITABLE EXTERNAL TABLE ext_tbl1
> ( LIKE tbl1 )
> LOCATION ('gpfdist://127.0.0.1/tbl1.csv')
> FORMAT 'CSV' (DELIMITER ',')
> DISTRIBUTED BY (a);
> CREATE WRITABLE EXTERNAL TABLE ext_tbl1_random
> ( LIKE tbl1 )
> LOCATION ('gpfdist://127.0.0.1/tbl1.random.csv')
> FORMAT 'CSV' (DELIMITER ',')
> DISTRIBUTED RANDOMLY;
> 2. Write the two external tables. We can find that the external table with hash distribution is slow with inserting, and plan shows that it has 1 workers only.
> postgres=# explain analyze INSERT INTO ext_tbl1 SELECT * from tbl1;
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> --------------------------------------------------------------------
> Insert (cost=0.00..509.20 rows=1000 width=8)
> Rows out: Avg 2111000.0 rows x 1 workers. Max/Last(seg0:host67/seg0:host67) 2111000/2111000 rows with 17/17 ms to first row, 20145/20145 ms to end, start offset by 18/18 ms.
> Executor memory: 1K bytes.
> -> Result (cost=0.00..431.07 rows=1000 width=20)
> Rows out: Avg 2111000.0 rows x 1 workers. Max/Last(seg0:host67/seg0:host67) 2111000/2111000 rows with 14/14 ms to first row, 1919/1919 ms to end, start offset by 18/18 ms
> .
> -> Redistribute Motion 1:1 (slice1; segments: 1) (cost=0.00..431.05 rows=1000 width=8)
> Hash Key: tbl1.a
> Rows out: Avg 2111000.0 rows x 1 workers at destination. Max/Last(seg0:host67/seg0:host67) 2111000/2111000 rows with 14/14 ms to first row, 1273/1273 ms to end, sta
> rt offset by 18/18 ms.
> -> Table Scan on tbl1 (cost=0.00..431.01 rows=1000 width=8)
> Rows out: Avg 2111000.0 rows x 1 workers. Max/Last(seg0:host67/seg0:host67) 2111000/2111000 rows with 13/13 ms to first row, 447/447 ms to end, start offset b
> y 18/18 ms.
> Slice statistics:
> (slice0) Executor memory: 293K bytes (seg0:host67).
> (slice1) Executor memory: 303K bytes (seg0:host67).
> Statement statistics:
> Memory used: 262144K bytes
> Optimizer status: PQO version 1.684
> Dispatcher statistics:
> executors used(total/cached/new connection): (2/0/2); dispatcher time(total/connection/dispatch data): (13.138 ms/12.628 ms/0.061 ms).
> dispatch data time(max/min/avg): (0.034 ms/0.025 ms/0.029 ms); consume executor data time(max/min/avg): (0.098 ms/0.036 ms/0.067 ms); free executor time(max/min/avg): (0.000 ms/0
> .000 ms/0.000 ms).
> Data locality statistics:
> data locality ratio: 1.000; virtual segment number: 1; different host number: 1; virtual segment number per host(avg/min/max): (1/1/1); segment size(avg/min/max): (46023656.000 B
> /46023656 B/46023656 B); segment size with penalty(avg/min/max): (46023656.000 B/46023656 B/46023656 B); continuity(avg/min/max): (1.000/1.000/1.000); DFS metadatacache: 27.930 ms;
> resource allocation: 11.879 ms; datalocality calculation: 0.207 ms.
> Total runtime: 20356.994 ms
> (22 rows)
> postgres=#
> postgres=# explain analyze INSERT INTO ext_tbl1_random SELECT * from tbl1;
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> ------------------------------------------
> Insert (cost=0.00..444.03 rows=167 width=8)
> Rows out: Avg 351833.3 rows x 6 workers. Max/Last(seg2:host67/seg5:host67) 351984/351854 rows with 61/51 ms to first row, 4731/4767 ms to end, start offset by 67/75 ms.
> Executor memory: 1K bytes avg, 1K bytes max (seg5:host67).
> -> Result (cost=0.00..431.01 rows=167 width=20)
> Rows out: Avg 351833.3 rows x 6 workers. Max/Last(seg2:host67/seg1:host67) 351984/351734 rows with 35/29 ms to first row, 616/705 ms to end, start offset by 67/77 ms.
> -> Redistribute Motion 6:6 (slice1; segments: 6) (cost=0.00..431.01 rows=167 width=8)
> Rows out: Avg 351833.3 rows x 6 workers at destination. Max/Last(seg2:host67/seg1:host67) 351984/351734 rows with 35/29 ms to first row, 439/485 ms to end, start of
> fset by 67/77 ms.
> -> Table Scan on tbl1 (cost=0.00..431.00 rows=167 width=8)
> Rows out: Avg 351833.3 rows x 6 workers. Max/Last(seg5:host67/seg4:host67) 351916/351801 rows with 45/34 ms to first row, 134/146 ms to end, start offset by 7
> 4/82 ms.
> Slice statistics:
> (slice0) Executor memory: 293K bytes avg x 6 workers, 293K bytes max (seg5:host67).
> (slice1) Executor memory: 384K bytes avg x 6 workers, 384K bytes max (seg5:host67).
> Statement statistics:
> Memory used: 262144K bytes
> Optimizer status: PQO version 1.684
> Dispatcher statistics:
> executors used(total/cached/new connection): (12/2/10); dispatcher time(total/connection/dispatch data): (68.122 ms/1480579527424.677 ms/3.747 ms).
> dispatch data time(max/min/avg): (0.203 ms/0.004 ms/0.029 ms); consume executor data time(max/min/avg): (0.052 ms/0.007 ms/0.017 ms); free executor time(max/min/avg): (0.000 ms/0
> .000 ms/0.000 ms).
> Data locality statistics:
> data locality ratio: 1.000; virtual segment number: 6; different host number: 1; virtual segment number per host(avg/min/max): (6/6/6); segment size(avg/min/max): (7670609.333 B/
> 7668464 B/7672344 B); segment size with penalty(avg/min/max): (0.000 B/0 B/0 B); continuity(avg/min/max): (1.000/1.000/1.000); DFS metadatacache: 0.104 ms; resource allocation: 0.67
> 1 ms; datalocality calculation: 0.109 ms.
> Total runtime: 4884.780 ms
> (21 rows)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)