You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hawq.apache.org by Hubert Zhang <hz...@pivotal.io> on 2016/12/02 02:53:37 UTC

Re: Random tables loaded from files

Hi Jon,
1 Your problem: the vseg number for your two queries ("explain analyze
select count(*) from tpcds.date_dim;"
"explain analyze select count(*) from tpcds_opt.date_dim;" ) depend on the
distribution type of your table and table size.
Could you give us more details about your tables by running "select * from
gp_distribution_policy where localoid=idofYourTable;"

2 Questions about cluster expanding.
    For new created hash tables, the bucket number depends on the current
value of default_hash_table_bucket_number (GUC), which is recommended to be
changed after cluster expanding. So no side effect on new created hash
table.
    For old hash tables, the bucket number is the old value of
default_hash_table_bucket_number.
After cluster expanding , for example 16 nodes to 64 nodes, the old value
of default_hash_table_bucket_number is 16*6=96, the new value of
default_hash_table_bucket_number
is 64*6=384.  Query such as "select count(*) from old_hash_table" will only
use 96 vsegs instead of 384 vsegs. One way is to reload the old hash table,
The other way is to use random table at the beginning.(This is why we use
random table as default in HAWQ2.x compared with hash table as default in
HAWQ1.x)

    For random table, the vseg number is determined by the table size. For
small tables(such as data size is less than one hdfs block size), only one
vseg will be used, and for big tables, the upper bound of vsegs number is
hawq_rm_nvseg_perquery_perseg_limit * #segment.
Cluster expanding will increase the number of segment(for example 16 nodes
to 64 nodes, and then queries on a random table will use 4X times number of
vsegs).

Thanks
Hubert


On Thu, Dec 1, 2016 at 5:16 AM, Jon Roberts <jr...@pivotal.io> wrote:

> I have a cluster with TPC-DS data and all data is loaded from external
> tables.  hawq_rm_nvseg_perquery_perseg_limit was set to the default of 6
> when the data was loaded and I have 10 nodes so the bucketnum = 60 for all
> tables.  All tables are also randomly distributed.
>
> In my efforts to optimize performance, I've tried
> increasing hawq_rm_nvseg_perquery_perseg_limit to utilize more resources
> to
> execute the queries.  Unfortunately, this hasn't helped any.
>
> I then noticed this:
>
> *"Random tables are copied from files: #vseg is a fixed value. #vseg is 6,
> when there are sufficient resources."*
> http://hdb.docs.pivotal.io/201/hawq/query/query-performance.html#topic38
>
> It appears that tables loaded from external tables have a fixed number of
> vsegs but tables loaded from other internal tables have a dynamic number.
> For example:
>
> --table loaded from an external table
> gpadmin=# explain analyze select count(*) from tpcds.date_dim;
>
>
>
>                              QUERY PLAN
>
>
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> -----------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> -----------------------------------------------
> ---------------------------------------------------------------------
>  Aggregate  (cost=0.00..431.09 rows=1 width=8)
>    Rows out:  Avg 1.0 rows x 1 workers.
> Max/Last(seg-1:ip-172-21-4-229.ec2.internal/seg-1:ip-172-
> 21-4-229.ec2.internal)
> 1/1 rows with 16277/16277 ms to end, start off
> set by 3255/3255 ms.
>    ->  Gather Motion 60:1  (slice1; segments: 60)  (cost=0.00..431.09
> rows=1 width=8)
>          Rows out:  Avg 60.0 rows x 1 workers at destination.
> Max/Last(seg-1:ip-172-21-4-229.ec2.internal/seg-1:ip-172-
> 21-4-229.ec2.internal)
> 60/60 rows with 39/39 ms
>  to first row, 16277/16277 ms to end, start offset by 3255/3255 ms.
>          ->  Aggregate  (cost=0.00..431.09 rows=1 width=8)
>                Rows out:  Avg 1.0 rows x 60 workers.
> Max/Last(seg59:ip-172-21-4-235.ec2.internal/seg11:ip-172-
> 21-4-226.ec2.internal)
> 1/1 rows with 107/16274 ms to end
> , start offset by 3257/3257 ms.
>                ->  Table Scan on date_dim  (cost=0.00..431.09 rows=1218
> width=1)
>                      Rows out:  Avg 1217.5 rows x 60 workers.
> Max/Last(seg51:ip-172-21-4-234.ec2.internal/seg11:ip-172-
> 21-4-226.ec2.internal)
> 1242/1232 rows with 5035
> /16273 ms to end, start offset by 3258/3257 ms.
>  Slice statistics:
>    (slice0)    Executor memory: 412K bytes.
>    (slice1)    Executor memory: 215K bytes avg x 60 workers, 215K bytes max
> (seg59:ip-172-21-4-235.ec2.internal).
>  Statement statistics:
>    Memory used: 262144K bytes
>  Settings:  default_hash_table_bucket_number=60; optimizer=on
>  Optimizer status: PQO version 1.638
>  Dispatcher statistics:
>    executors used(total/cached/new connection): (60/1/59); dispatcher
> time(total/connection/dispatch data): (3254.181 ms/1480539781639.241
> ms/0.483 ms).
>    dispatch data time(max/min/avg): (0.021 ms/0.005 ms/0.007 ms); consume
> executor data time(max/min/avg): (0.044 ms/0.003 ms/0.009 ms); free
> executor time(max/min/avg
> ): (0.000 ms/0.000 ms/0.000 ms).
>  Data locality statistics:
>    data locality ratio: 1.000; virtual segment number: 60; different host
> number: 10; virtual segment number per host(avg/min/max): (6/6/6); segment
> size(avg/min/max):
>  (186145.950 B/182662 B/189757 B); segment size with penalty(avg/min/max):
> (0.000 B/0 B/0 B); continuity(avg/min/max): (1.000/1.000/1.000); DFS
> metadatacache: 0.263 ms
> ; resource allocation: 0.521 ms; datalocality calculation: 0.114 ms.
>  Total runtime: 19549.798 ms
> (21 rows)
>
> Time: 19651.159 ms
>
> This is just a small dimension table too.
>
> I then loaded that data from the local table to a new random table.
>
> gpadmin=# explain analyze select count(*) from tpcds_opt.date_dim;
>
>
>
>                                           QUERY PLAN
>
>
>
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> -----------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> -----------------------------------------------
> ------------------------------------------------------------
> ----------------------------------
>  Aggregate  (cost=0.00..436.36 rows=1 width=8)
>    Rows out:  Avg 1.0 rows x 1 workers.
> Max/Last(seg-1:ip-172-21-4-229.ec2.internal/seg-1:ip-172-
> 21-4-229.ec2.internal)
> 1/1 rows with 2624/2624 ms to end, start offse
> t by 1.081/1.081 ms.
>    ->  Gather Motion 1:1  (slice1; segments: 1)  (cost=0.00..436.36 rows=1
> width=8)
>          Rows out:  Avg 1.0 rows x 1 workers at destination.
> Max/Last(seg-1:ip-172-21-4-229.ec2.internal/seg-1:ip-172-
> 21-4-229.ec2.internal)
> 1/1 rows with 2624/2624 m
> s to end, start offset by 1.082/1.082 ms.
>          ->  Aggregate  (cost=0.00..436.36 rows=1 width=8)
>                Rows out:  Avg 1.0 rows x 1 workers.
> Max/Last(seg0:ip-172-21-4-225.ec2.internal/seg0:ip-172-21-4-
> 225.ec2.internal)
> 1/1 rows with 2621/2621 ms to end, s
> tart offset by 0.846/0.846 ms.
>                ->  Table Scan on date_dim  (cost=0.00..436.22 rows=73049
> width=1)
>                      Rows out:  Avg 73049.0 rows x 1 workers.
> Max/Last(seg0:ip-172-21-4-225.ec2.internal/seg0:ip-172-21-4-
> 225.ec2.internal)
> 73049/73049 rows with 2.96
> 6/2.966 ms to first row, 2595/2595 ms to end, start offset by 0.847/0.847
> ms.
>  Slice statistics:
>    (slice0)    Executor memory: 170K bytes.
>    (slice1)    Executor memory: 343K bytes
> (seg0:ip-172-21-4-225.ec2.internal).
>  Statement statistics:
>    Memory used: 262144K bytes
>  Settings:  default_hash_table_bucket_number=60; optimizer=on
>  Optimizer status: PQO version 1.638
>  Dispatcher statistics:
>    executors used(total/cached/new connection): (1/1/0); dispatcher
> time(total/connection/dispatch data): (0.122 ms/0.000 ms/0.015 ms).
>    dispatch data time(max/min/avg): (0.015 ms/0.015 ms/0.015 ms); consume
> executor data time(max/min/avg): (0.011 ms/0.011 ms/0.011 ms); free
> executor time(max/min/avg
> ): (0.000 ms/0.000 ms/0.000 ms).
>  Data locality statistics:
>    data locality ratio: 0.296; virtual segment number: 1; different host
> number: 1; virtual segment number per host(avg/min/max): (1/1/1); segment
> size(avg/min/max): (
> 11168757.000 B/11168757 B/11168757 B); segment size with
> penalty(avg/min/max): (11247341.000 B/11247341 B/11247341 B);
> continuity(avg/min/max): (1.000/1.000/1.000); DF
> S metadatacache: 0.254 ms; resource allocation: 0.387 ms; datalocality
> calculation: 0.130 ms.
>  Total runtime: 2627.711 ms
> (21 rows)
>
> Time: 2728.409 ms
>
> I'm able to decrease the query execution time of many queries by
> increasing hawq_rm_nvseg_perquery_perseg_limit but only for the tables
> loaded from other local tables and not tables loaded from external tables.
> Based on the documentation, this appears to be the expected behavior.
>
> - Are there plans to correct this?
> - What happens if the cluster expands?  Will these random need to be
> redistributed?
> - Are there workarounds to this issue?
>
> Jon Roberts
>



-- 
Thanks

Hubert Zhang

Re: Random tables loaded from files

Posted by Jon Roberts <jr...@pivotal.io>.
I'm happy to report that I made an error!  I thought I had set my scripts
to use random distribution for the tpcds schema but I used hash.  That is
why it wasn't behaving as I had expected.

Sorry for the noise.

Jon Roberts
Principal Engineer | jroberts@pivotal.io | 615-426-8661

On Thu, Dec 1, 2016 at 8:53 PM, Hubert Zhang <hz...@pivotal.io> wrote:

> Hi Jon,
> 1 Your problem: the vseg number for your two queries ("explain analyze
> select count(*) from tpcds.date_dim;"
> "explain analyze select count(*) from tpcds_opt.date_dim;" ) depend on the
> distribution type of your table and table size.
> Could you give us more details about your tables by running "select * from
> gp_distribution_policy where localoid=idofYourTable;"
>
> 2 Questions about cluster expanding.
>     For new created hash tables, the bucket number depends on the current
> value of default_hash_table_bucket_number (GUC), which is recommended to
> be
> changed after cluster expanding. So no side effect on new created hash
> table.
>     For old hash tables, the bucket number is the old value of
> default_hash_table_bucket_number.
> After cluster expanding , for example 16 nodes to 64 nodes, the old value
> of default_hash_table_bucket_number is 16*6=96, the new value of
> default_hash_table_bucket_number
> is 64*6=384.  Query such as "select count(*) from old_hash_table" will only
> use 96 vsegs instead of 384 vsegs. One way is to reload the old hash table,
> The other way is to use random table at the beginning.(This is why we use
> random table as default in HAWQ2.x compared with hash table as default in
> HAWQ1.x)
>
>     For random table, the vseg number is determined by the table size. For
> small tables(such as data size is less than one hdfs block size), only one
> vseg will be used, and for big tables, the upper bound of vsegs number is
> hawq_rm_nvseg_perquery_perseg_limit * #segment.
> Cluster expanding will increase the number of segment(for example 16 nodes
> to 64 nodes, and then queries on a random table will use 4X times number of
> vsegs).
>
> Thanks
> Hubert
>
>
> On Thu, Dec 1, 2016 at 5:16 AM, Jon Roberts <jr...@pivotal.io> wrote:
>
> > I have a cluster with TPC-DS data and all data is loaded from external
> > tables.  hawq_rm_nvseg_perquery_perseg_limit was set to the default of 6
> > when the data was loaded and I have 10 nodes so the bucketnum = 60 for
> all
> > tables.  All tables are also randomly distributed.
> >
> > In my efforts to optimize performance, I've tried
> > increasing hawq_rm_nvseg_perquery_perseg_limit to utilize more resources
> > to
> > execute the queries.  Unfortunately, this hasn't helped any.
> >
> > I then noticed this:
> >
> > *"Random tables are copied from files: #vseg is a fixed value. #vseg is
> 6,
> > when there are sufficient resources."*
> > http://hdb.docs.pivotal.io/201/hawq/query/query-performance.html#topic38
> >
> > It appears that tables loaded from external tables have a fixed number of
> > vsegs but tables loaded from other internal tables have a dynamic number.
> > For example:
> >
> > --table loaded from an external table
> > gpadmin=# explain analyze select count(*) from tpcds.date_dim;
> >
> >
> >
> >                              QUERY PLAN
> >
> >
> >
> > ------------------------------------------------------------
> > ------------------------------------------------------------
> > -----------------------------------------------
> > ------------------------------------------------------------
> > ------------------------------------------------------------
> > -----------------------------------------------
> > ---------------------------------------------------------------------
> >  Aggregate  (cost=0.00..431.09 rows=1 width=8)
> >    Rows out:  Avg 1.0 rows x 1 workers.
> > Max/Last(seg-1:ip-172-21-4-229.ec2.internal/seg-1:ip-172-
> > 21-4-229.ec2.internal)
> > 1/1 rows with 16277/16277 ms to end, start off
> > set by 3255/3255 ms.
> >    ->  Gather Motion 60:1  (slice1; segments: 60)  (cost=0.00..431.09
> > rows=1 width=8)
> >          Rows out:  Avg 60.0 rows x 1 workers at destination.
> > Max/Last(seg-1:ip-172-21-4-229.ec2.internal/seg-1:ip-172-
> > 21-4-229.ec2.internal)
> > 60/60 rows with 39/39 ms
> >  to first row, 16277/16277 ms to end, start offset by 3255/3255 ms.
> >          ->  Aggregate  (cost=0.00..431.09 rows=1 width=8)
> >                Rows out:  Avg 1.0 rows x 60 workers.
> > Max/Last(seg59:ip-172-21-4-235.ec2.internal/seg11:ip-172-
> > 21-4-226.ec2.internal)
> > 1/1 rows with 107/16274 ms to end
> > , start offset by 3257/3257 ms.
> >                ->  Table Scan on date_dim  (cost=0.00..431.09 rows=1218
> > width=1)
> >                      Rows out:  Avg 1217.5 rows x 60 workers.
> > Max/Last(seg51:ip-172-21-4-234.ec2.internal/seg11:ip-172-
> > 21-4-226.ec2.internal)
> > 1242/1232 rows with 5035
> > /16273 ms to end, start offset by 3258/3257 ms.
> >  Slice statistics:
> >    (slice0)    Executor memory: 412K bytes.
> >    (slice1)    Executor memory: 215K bytes avg x 60 workers, 215K bytes
> max
> > (seg59:ip-172-21-4-235.ec2.internal).
> >  Statement statistics:
> >    Memory used: 262144K bytes
> >  Settings:  default_hash_table_bucket_number=60; optimizer=on
> >  Optimizer status: PQO version 1.638
> >  Dispatcher statistics:
> >    executors used(total/cached/new connection): (60/1/59); dispatcher
> > time(total/connection/dispatch data): (3254.181 ms/1480539781639.241
> > ms/0.483 ms).
> >    dispatch data time(max/min/avg): (0.021 ms/0.005 ms/0.007 ms); consume
> > executor data time(max/min/avg): (0.044 ms/0.003 ms/0.009 ms); free
> > executor time(max/min/avg
> > ): (0.000 ms/0.000 ms/0.000 ms).
> >  Data locality statistics:
> >    data locality ratio: 1.000; virtual segment number: 60; different host
> > number: 10; virtual segment number per host(avg/min/max): (6/6/6);
> segment
> > size(avg/min/max):
> >  (186145.950 B/182662 B/189757 B); segment size with
> penalty(avg/min/max):
> > (0.000 B/0 B/0 B); continuity(avg/min/max): (1.000/1.000/1.000); DFS
> > metadatacache: 0.263 ms
> > ; resource allocation: 0.521 ms; datalocality calculation: 0.114 ms.
> >  Total runtime: 19549.798 ms
> > (21 rows)
> >
> > Time: 19651.159 ms
> >
> > This is just a small dimension table too.
> >
> > I then loaded that data from the local table to a new random table.
> >
> > gpadmin=# explain analyze select count(*) from tpcds_opt.date_dim;
> >
> >
> >
> >                                           QUERY PLAN
> >
> >
> >
> >
> > ------------------------------------------------------------
> > ------------------------------------------------------------
> > -----------------------------------------------
> > ------------------------------------------------------------
> > ------------------------------------------------------------
> > -----------------------------------------------
> > ------------------------------------------------------------
> > ----------------------------------
> >  Aggregate  (cost=0.00..436.36 rows=1 width=8)
> >    Rows out:  Avg 1.0 rows x 1 workers.
> > Max/Last(seg-1:ip-172-21-4-229.ec2.internal/seg-1:ip-172-
> > 21-4-229.ec2.internal)
> > 1/1 rows with 2624/2624 ms to end, start offse
> > t by 1.081/1.081 ms.
> >    ->  Gather Motion 1:1  (slice1; segments: 1)  (cost=0.00..436.36
> rows=1
> > width=8)
> >          Rows out:  Avg 1.0 rows x 1 workers at destination.
> > Max/Last(seg-1:ip-172-21-4-229.ec2.internal/seg-1:ip-172-
> > 21-4-229.ec2.internal)
> > 1/1 rows with 2624/2624 m
> > s to end, start offset by 1.082/1.082 ms.
> >          ->  Aggregate  (cost=0.00..436.36 rows=1 width=8)
> >                Rows out:  Avg 1.0 rows x 1 workers.
> > Max/Last(seg0:ip-172-21-4-225.ec2.internal/seg0:ip-172-21-4-
> > 225.ec2.internal)
> > 1/1 rows with 2621/2621 ms to end, s
> > tart offset by 0.846/0.846 ms.
> >                ->  Table Scan on date_dim  (cost=0.00..436.22 rows=73049
> > width=1)
> >                      Rows out:  Avg 73049.0 rows x 1 workers.
> > Max/Last(seg0:ip-172-21-4-225.ec2.internal/seg0:ip-172-21-4-
> > 225.ec2.internal)
> > 73049/73049 rows with 2.96
> > 6/2.966 ms to first row, 2595/2595 ms to end, start offset by 0.847/0.847
> > ms.
> >  Slice statistics:
> >    (slice0)    Executor memory: 170K bytes.
> >    (slice1)    Executor memory: 343K bytes
> > (seg0:ip-172-21-4-225.ec2.internal).
> >  Statement statistics:
> >    Memory used: 262144K bytes
> >  Settings:  default_hash_table_bucket_number=60; optimizer=on
> >  Optimizer status: PQO version 1.638
> >  Dispatcher statistics:
> >    executors used(total/cached/new connection): (1/1/0); dispatcher
> > time(total/connection/dispatch data): (0.122 ms/0.000 ms/0.015 ms).
> >    dispatch data time(max/min/avg): (0.015 ms/0.015 ms/0.015 ms); consume
> > executor data time(max/min/avg): (0.011 ms/0.011 ms/0.011 ms); free
> > executor time(max/min/avg
> > ): (0.000 ms/0.000 ms/0.000 ms).
> >  Data locality statistics:
> >    data locality ratio: 0.296; virtual segment number: 1; different host
> > number: 1; virtual segment number per host(avg/min/max): (1/1/1); segment
> > size(avg/min/max): (
> > 11168757.000 B/11168757 B/11168757 B); segment size with
> > penalty(avg/min/max): (11247341.000 B/11247341 B/11247341 B);
> > continuity(avg/min/max): (1.000/1.000/1.000); DF
> > S metadatacache: 0.254 ms; resource allocation: 0.387 ms; datalocality
> > calculation: 0.130 ms.
> >  Total runtime: 2627.711 ms
> > (21 rows)
> >
> > Time: 2728.409 ms
> >
> > I'm able to decrease the query execution time of many queries by
> > increasing hawq_rm_nvseg_perquery_perseg_limit but only for the tables
> > loaded from other local tables and not tables loaded from external
> tables.
> > Based on the documentation, this appears to be the expected behavior.
> >
> > - Are there plans to correct this?
> > - What happens if the cluster expands?  Will these random need to be
> > redistributed?
> > - Are there workarounds to this issue?
> >
> > Jon Roberts
> >
>
>
>
> --
> Thanks
>
> Hubert Zhang
>