You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hawq.apache.org by Marek Wiewiorka <ma...@gmail.com> on 2016/02/21 22:28:07 UTC

Controlling number of threads per segment

Hi All,
I've spent a lot of time trying to figure out how to control the number of
segments instances/threads per node in Hawq and could find any information
on that.
In fact I'm a bit confused:
1)in this blog entry:
http://0x0fff.com/spark-dataframes-are-faster-arent-they/#more-268

I found that Alexey had a cluster of 4 nodes with 10threads running on each
node.
If I query the same table (gp_segment_configuration) in my installation I
get only 5 rows - one per each node - this might indicate that I have only
one segment instance running per node.

2) On the other hand when I monitor cpu utilization while running some test
queries I can observe
that actually 8 threads per node are active. I can also see that all my
tables have 40 segments at maximum.

I found some pieces of information on 2 params:
NSegs
The number of segment instances to run per segment
host.

gp_vmem_protect_limit
The amount of memory allowed to a single segment instance on a host.

But when I tried to put them in posgresql.conf of my nodes I found in log
that both are unrecognized:

2016-02-21 20:16:36.720257
GMT,,,p20790,th-670398080,,,,0,,,seg-10000,,,,,"LOG","42704","unrecognized
configuration parameter
""gp_vmem_protect_limit""",,,,,,,,"set_config_option","guc.c",9933,
2016-02-21 20:16:36.720913
GMT,,,p20790,th-670398080,,,,0,,,seg-10000,,,,,"LOG","42704","unrecognized
configuration parameter ""NSegs""",,,,,,,,"set_config_option","guc.c",9933,


So my question is how to:
1)Control number of threads/segment instances per node?
2)Control number of segments per table?
I suspect that these 2 things might be somehow interconnected.

Many thanks for any hints on that.

Marek

Re: Controlling number of threads per segment

Posted by Alexey Grishchenko <pr...@gmail.com>.
Hi

1) I've used Pivotal HAWQ v1.3, this is why it showed this number. The
version of HAWQ that is released to Apache was the branch of HAWQ 2.0, and
it works in a bit different way. It creates a single "segment" per node,
but in fact it can spawn any number of executors per node (while you have
enough resources).

2) In Apache HAWQ the amount of virtual segments (i.e. executors) brought
up depends on many factors, and the most important one is the data size.
This is why you observe many executors per node. To get more information on
what's happening I'd recommend you to run "explain analyze <your query>",
this way you would get all the details including the amount of executors
used

Regarding parameters, I'd recommend you to check the documentation here:
http://hdb.docs.pivotal.io/. gp_vmem_protect_limit is deprecated

1) You can enforce vseg number with enforce_virtual_segment_number GUC on a
session level. Set default number with default_segment_num in hawq-site.xml
2) Number of buckets you mean? You can do it on table creation time: CREATE
TABLE t1(c1 int) WITH (bucketnum = 3);




On Sun, Feb 21, 2016 at 9:28 PM, Marek Wiewiorka <ma...@gmail.com>
wrote:

> Hi All,
> I've spent a lot of time trying to figure out how to control the number of
> segments instances/threads per node in Hawq and could find any information
> on that.
> In fact I'm a bit confused:
> 1)in this blog entry:
> http://0x0fff.com/spark-dataframes-are-faster-arent-they/#more-268
>
> I found that Alexey had a cluster of 4 nodes with 10threads running on
> each node.
> If I query the same table (gp_segment_configuration) in my installation I
> get only 5 rows - one per each node - this might indicate that I have only
> one segment instance running per node.
>
> 2) On the other hand when I monitor cpu utilization while running some
> test queries I can observe
> that actually 8 threads per node are active. I can also see that all my
> tables have 40 segments at maximum.
>
> I found some pieces of information on 2 params:
> NSegs
> The number of segment instances to run per segment
> host.
>
> gp_vmem_protect_limit
> The amount of memory allowed to a single segment instance on a host.
>
> But when I tried to put them in posgresql.conf of my nodes I found in log
> that both are unrecognized:
>
> 2016-02-21 20:16:36.720257
> GMT,,,p20790,th-670398080,,,,0,,,seg-10000,,,,,"LOG","42704","unrecognized
> configuration parameter
> ""gp_vmem_protect_limit""",,,,,,,,"set_config_option","guc.c",9933,
> 2016-02-21 20:16:36.720913
> GMT,,,p20790,th-670398080,,,,0,,,seg-10000,,,,,"LOG","42704","unrecognized
> configuration parameter ""NSegs""",,,,,,,,"set_config_option","guc.c",9933,
>
>
> So my question is how to:
> 1)Control number of threads/segment instances per node?
> 2)Control number of segments per table?
> I suspect that these 2 things might be somehow interconnected.
>
> Many thanks for any hints on that.
>
> Marek
>



-- 
Alexey Grishchenko, http://0x0fff.com