You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hawq.apache.org by Hubert Zhang <hz...@pivotal.io> on 2017/04/06 02:13:20 UTC

Re: HAWQ: Web external table on segments.

Why not use gpssh to excute shell on each node?

On Wed, Apr 5, 2017 at 3:11 PM, Cyrille Lintz <cl...@pivotal.io> wrote:

> Hello,
>
> From the HDB guide (
> http://hdb.docs.pivotal.io/212/hawq/reference/sql/CREATE-
> EXTERNAL-TABLE.html#topic1__section4),
> I read about Web external tables
>
> *Note: ON ALL/HOST is deprecated when creating a readable external table,
> as HAWQ cannot guarantee scheduling executors on a specific host. Instead,
> use ON MASTER, ON <number>, or SEGMENT <virtual_segment> to specify which
> segment instances will execute the command.*
>
>
> In my opinion, if possible, we should re-introduce the ON ALL option for
> the external WEB tables,
> I am concerned with the option ON <number> in the external web table
> definition. We have to use the number of current hosts. So if we expand the
> cluster, we will have to change this external web table.
>
> - If we have a value smaller than the actual number of hosts, some rows
> will miss.
> - If we have a value greater than the actual number of hosts, some rows
> will be duplicated.
>
>
> If we add the option ON ALL:
>
> - it will help to monitor the spill files
> - it will help to read the segment log files (see the commented DDL
> hawq_toolkit._hawq_log_segment_ext in the file $GPHOME/share/postgresql)
>
>
> I know that the option ON HOST and ON ALL were deprecated due to elastic
> runtime in HAWQ 2.x. It is related to the Hadoop architecture.
>
> However, how could we execute once a shell on each host of the cluster via
> an external web table?
> In this case, we are not using Hadoop FS, but local FS.
>
> Thanks,
>
>
> *Cyrille LINTZ*Advisory Solution Architect  |  Pivotal Europe South
> Mobile: + 33 (0)6 11 48 71 10 | clintz@pivotal.io
>



-- 
Thanks

Hubert Zhang

Re: HAWQ: Web external table on segments.

Posted by Hubert Zhang <hz...@pivotal.io>.
Virtual segments are required from Resource Manager, It cannot guarantee
that resource is available on a certain node at a certain time.
@yijin, Do you have some comments on it?

On Thu, Apr 6, 2017 at 10:13 AM, Hubert Zhang <hz...@pivotal.io> wrote:

> Why not use gpssh to excute shell on each node?
>
> On Wed, Apr 5, 2017 at 3:11 PM, Cyrille Lintz <cl...@pivotal.io> wrote:
>
>> Hello,
>>
>> From the HDB guide (
>> http://hdb.docs.pivotal.io/212/hawq/reference/sql/CREATE-EXT
>> ERNAL-TABLE.html#topic1__section4),
>> I read about Web external tables
>>
>> *Note: ON ALL/HOST is deprecated when creating a readable external table,
>> as HAWQ cannot guarantee scheduling executors on a specific host. Instead,
>> use ON MASTER, ON <number>, or SEGMENT <virtual_segment> to specify which
>> segment instances will execute the command.*
>>
>>
>> In my opinion, if possible, we should re-introduce the ON ALL option for
>> the external WEB tables,
>> I am concerned with the option ON <number> in the external web table
>> definition. We have to use the number of current hosts. So if we expand
>> the
>> cluster, we will have to change this external web table.
>>
>> - If we have a value smaller than the actual number of hosts, some rows
>> will miss.
>> - If we have a value greater than the actual number of hosts, some rows
>> will be duplicated.
>>
>>
>> If we add the option ON ALL:
>>
>> - it will help to monitor the spill files
>> - it will help to read the segment log files (see the commented DDL
>> hawq_toolkit._hawq_log_segment_ext in the file $GPHOME/share/postgresql)
>>
>>
>> I know that the option ON HOST and ON ALL were deprecated due to elastic
>> runtime in HAWQ 2.x. It is related to the Hadoop architecture.
>>
>> However, how could we execute once a shell on each host of the cluster via
>> an external web table?
>> In this case, we are not using Hadoop FS, but local FS.
>>
>> Thanks,
>>
>>
>> *Cyrille LINTZ*Advisory Solution Architect  |  Pivotal Europe South
>> Mobile: + 33 (0)6 11 48 71 10 | clintz@pivotal.io
>>
>
>
>
> --
> Thanks
>
> Hubert Zhang
>



-- 
Thanks

Hubert Zhang

Re: HAWQ: Web external table on segments.

Posted by Hubert Zhang <hz...@pivotal.io>.
Virtual segments are required from Resource Manager, It cannot guarantee
that resource is available on a certain node at a certain time.
@yijin, Do you have some comments on it?

On Thu, Apr 6, 2017 at 10:13 AM, Hubert Zhang <hz...@pivotal.io> wrote:

> Why not use gpssh to excute shell on each node?
>
> On Wed, Apr 5, 2017 at 3:11 PM, Cyrille Lintz <cl...@pivotal.io> wrote:
>
>> Hello,
>>
>> From the HDB guide (
>> http://hdb.docs.pivotal.io/212/hawq/reference/sql/CREATE-EXT
>> ERNAL-TABLE.html#topic1__section4),
>> I read about Web external tables
>>
>> *Note: ON ALL/HOST is deprecated when creating a readable external table,
>> as HAWQ cannot guarantee scheduling executors on a specific host. Instead,
>> use ON MASTER, ON <number>, or SEGMENT <virtual_segment> to specify which
>> segment instances will execute the command.*
>>
>>
>> In my opinion, if possible, we should re-introduce the ON ALL option for
>> the external WEB tables,
>> I am concerned with the option ON <number> in the external web table
>> definition. We have to use the number of current hosts. So if we expand
>> the
>> cluster, we will have to change this external web table.
>>
>> - If we have a value smaller than the actual number of hosts, some rows
>> will miss.
>> - If we have a value greater than the actual number of hosts, some rows
>> will be duplicated.
>>
>>
>> If we add the option ON ALL:
>>
>> - it will help to monitor the spill files
>> - it will help to read the segment log files (see the commented DDL
>> hawq_toolkit._hawq_log_segment_ext in the file $GPHOME/share/postgresql)
>>
>>
>> I know that the option ON HOST and ON ALL were deprecated due to elastic
>> runtime in HAWQ 2.x. It is related to the Hadoop architecture.
>>
>> However, how could we execute once a shell on each host of the cluster via
>> an external web table?
>> In this case, we are not using Hadoop FS, but local FS.
>>
>> Thanks,
>>
>>
>> *Cyrille LINTZ*Advisory Solution Architect  |  Pivotal Europe South
>> Mobile: + 33 (0)6 11 48 71 10 | clintz@pivotal.io
>>
>
>
>
> --
> Thanks
>
> Hubert Zhang
>



-- 
Thanks

Hubert Zhang

Re: HAWQ: Web external table on segments.

Posted by Cyrille Lintz <cl...@pivotal.io>.
Hello,

gpssh could be a solution, but it requires to have an access to the master.
I want to create external tables for some users who don't have access to
Master.

For example, I would like to create an external table in order to monitor
the spill files.

DROP EXTERNAL TABLE IF EXISTS spills ;

CREATE EXTERNAL WEB TABLE spills (hostname text, size text, path text)
EXECUTE E'du -sb /datac/hawq/segment /datad/hawq/segment
/datae/hawq/segment /dataf/hawq/segment /datah/hawq/segment
/datai/hawq/segment /dataj/hawq/segment /datak/hawq/segment| sed
"s/^/$(hostname)\t /"'
ON 3
FORMAT 'TEXT'
(DELIMITER E'\t') ;


Thanks,


*Cyrille LINTZ*Advisory Solution Architect  |  Pivotal Europe South
Mobile: + 33 (0)6 11 48 71 10 | clintz@pivotal.io

2017-04-06 4:13 GMT+02:00 Hubert Zhang <hz...@pivotal.io>:

> Why not use gpssh to excute shell on each node?
>
> On Wed, Apr 5, 2017 at 3:11 PM, Cyrille Lintz <cl...@pivotal.io> wrote:
>
> > Hello,
> >
> > From the HDB guide (
> > http://hdb.docs.pivotal.io/212/hawq/reference/sql/CREATE-
> > EXTERNAL-TABLE.html#topic1__section4),
> > I read about Web external tables
> >
> > *Note: ON ALL/HOST is deprecated when creating a readable external table,
> > as HAWQ cannot guarantee scheduling executors on a specific host.
> Instead,
> > use ON MASTER, ON <number>, or SEGMENT <virtual_segment> to specify which
> > segment instances will execute the command.*
> >
> >
> > In my opinion, if possible, we should re-introduce the ON ALL option for
> > the external WEB tables,
> > I am concerned with the option ON <number> in the external web table
> > definition. We have to use the number of current hosts. So if we expand
> the
> > cluster, we will have to change this external web table.
> >
> > - If we have a value smaller than the actual number of hosts, some rows
> > will miss.
> > - If we have a value greater than the actual number of hosts, some rows
> > will be duplicated.
> >
> >
> > If we add the option ON ALL:
> >
> > - it will help to monitor the spill files
> > - it will help to read the segment log files (see the commented DDL
> > hawq_toolkit._hawq_log_segment_ext in the file $GPHOME/share/postgresql)
> >
> >
> > I know that the option ON HOST and ON ALL were deprecated due to elastic
> > runtime in HAWQ 2.x. It is related to the Hadoop architecture.
> >
> > However, how could we execute once a shell on each host of the cluster
> via
> > an external web table?
> > In this case, we are not using Hadoop FS, but local FS.
> >
> > Thanks,
> >
> >
> > *Cyrille LINTZ*Advisory Solution Architect  |  Pivotal Europe South
> > Mobile: + 33 (0)6 11 48 71 10 | clintz@pivotal.io
> >
>
>
>
> --
> Thanks
>
> Hubert Zhang
>