You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hawq.apache.org by John Glorioso <jo...@staq.com> on 2018/10/16 22:31:39 UTC

wrap around xid warnings

We are running a decent sized cluster (20 nodes) and have started receiving
the following error today:

WARNING:  database "hcatalog" must be vacuumed within 42324912
transactions  (seg27 worker09.xxx.com:40000 pid=689628)
HINT:  To avoid a database shutdown, execute a full-database VACUUM in
"hcatalog".

I found this Pivotal article:
https://community.pivotal.io/s/article/About-XIDs-and-XID-Wraparound-in-Greenplum
which discusses xid wrap around and that led me to running a query which
identified tables in each database that need a vacuum freeze run against it.

https://community.pivotal.io/s/article/Vacuuming-Old-Tables-in-a-Database-to-Reduce-its-Age

The issue, however, is that it is complaining about the hcatalog database
which is not an accessible database as it was used to query data in Hive at
one point. We have since disabled Hive and no longer need that external PXF
link but I am unclear on how to remove it.

This article shows how to detect database age and I can run the first part
but it is for Greenplum and I do not know the equivalent properties in HAWQ
for the case statement to work.

https://community.pivotal.io/s/article/How-to-check-database-age-in-Greenplum

The results for the following sql statement are below:

SELECT gp_segment_id, datname, age(datfrozenxid) age FROM pg_database
UNION ALL
SELECT gp_segment_id, datname, age(datfrozenxid) age FROM
gp_dist_random('pg_database');

 gp_segment_id |  datname   |    age
---------------+------------+------------
            -1 | hcatalog   |  212776719
            -1 | template1  |      39104
            -1 | template0  |  212775841
            -1 | production |   10604565
            -1 | postgres   |      39104
            -1 | gpadmin    |      39104
             0 | template1  | 2100536422
             0 | hcatalog   | 2100537300
(8 rows)

I cannot access hcatalog or template1 to issue vacuum freeze statements
against the tables and am unsure how to proceed.

Any help is greatly appreciated!

John Glorioso
Lead Engineer
john@staq.com

Re: wrap around xid warnings

Posted by John Glorioso <jo...@staq.com>.
You can definitely do it in single user mode with a command like:

/path_to_hawq/bin/postgres --single -P -O -p 9999 -D /xxxx/hawq/segment -c
gp_session_role=utility template1

I was able to do this on the affected segments once I realized that
template1 was the local database HAWQ uses for parallel processing. Once in
there I was able to view the datfrozenxid and age on the hcatalog table of
the older segments. At that point, since we have Hive removed from the
cluster now I was curious to see what queries to hcatalog would return.

When I issued \dt hcatalog.*.* it returned all of the local schemas and
tables which I thought odd. When I would then try to issue a \d
hcatalog.template1.some_table it would time out with the expected Hive
metastore service could not be contacted. So that made me realize that the
xid reported for the hcatalog database was essentially meaningless.

With that, I performed a test of updating the datfrozenxid to that of
template1 since I was not concerned about wrap around or potential data
loss since there is no data in hcatalog. This, of course, is not ideal but
succeeded in eliminating the warnings.

Lastly, I searched through the logs yesterday and found that it has been
trying to auto vacuum hcatalog for quite sometime unsuccessfully. The log
entry was:

"ERROR","3D000","""hcatalog"" database is only for system
use",,,,,,,0,,"postinit.c",386

which leads to the source:
https://github.com/apache/hawq/blob/master/src/backend/utils/init/postinit.c#L376
where they state it is specifically blocked as it is an integration. I get
that but autovacuum is a system process and would expect that function to
succeed and this problem to be prevalent in any cluster that used the
hcatalog integration that reached the limit?

We have an extremely active cluster and reached two billion transactions in
under a year. It would be unfortunate to have go through this maintenance
routinely. I look forward to more feedback into my work around or any other
explanations that would help alleviate this from coming up again in the
future.

John Glorioso
Lead Engineer
john@staq.com




On Thu, Oct 18, 2018 at 5:34 AM Aitor Cedres <ac...@pivotal.io> wrote:

> Hi John,
>
> The problem seems to be that database 'hcatalog' has connections disabled
> in pg_database. If you run 'select * from pg_database', the column
> 'datallowconn' should be 'f'. I think a regular user is not allowed to make
> modification to the system tables. There was a hidden GUC to allow DML
> operations on the catalog, but I can't remember from the top of my head.
>
> If you manage to get DML modification on the catalog, running something
> like `UPDATE pg_database SET datallowconn = 't' where datname = 'hcatalog'`
> should allow you to access the database and run VACUUM FREEZE.
>
> Best regards,
> Aitor
>
> On 17/10/2018 20:26, John Glorioso wrote:
>
> Digging deeper on this, I found that HAWQ has been trying to auto vacuum
> hcatalog for months now with the following error:
>
> 2018-09-16 14:19:01.646399
> UTC,"xxx","yyy",p252354,th493275424,"11.22.33.44","58083",2018-09-16
> 14:18:57
> UTC,0,con433331,cmd114,seg0,slice1,,,sx1,"LOG","00000","GetNewTransactionId:
> requesting autovac (xid 1932591104 xidVacLimit 200000003)",,,,,,"select
> Ta.client_commissioned_measurement_methodology as v,
> count(Ta.client_commissioned_measurement_methodology)::float4/588.000000::float4
> as f from my_schema.my_table as Ta where
> Ta.client_commissioned_measurement_methodology is not null group by
> (Ta.client_commissioned_measurement_methodology) order by f desc limit
> 25",0,,"varsup.c",98,
> 2018-09-16 14:19:01.663978
> UTC,,,p252782,th493275424,,,,0,,,seg-10000,,,,,"ERROR","3D000","""hcatalog""
> database is only for system use",,,,,,,0,,"postinit.c",386,
>
> hcatalog being a system database (and one that isn't even real
> technically) seems to be causing the issue. I have successfully vacuumed
> the others.
>
> John Glorioso
> Lead Engineer
> john@staq.com
>
>
>
>
> On Wed, Oct 17, 2018 at 9:19 AM John Glorioso <jo...@staq.com> wrote:
>
>> I believe the issue is the FREEZE vacuum option. Is there a setting in
>> config that I can verify? Better yet since we no longer use hcatalog to
>> connect to hive is there a way to just simply disable or clear
>> datfrozenxid? I don't imagine dropping hcatalog would work and I am
>> hesitant to try to manually update that value to null.
>>
>> -John
>>
>> On Wed, Oct 17, 2018, 1:49 AM Paul Guo <pa...@gmail.com> wrote:
>>
>>> Looks like HAWQ should do auto-vacuum for those databases which DBA can
>>> not connect?
>>>
>>>
>>> John Glorioso <jo...@staq.com> 于2018年10月17日周三 上午6:32写道:
>>>
>>>> We are running a decent sized cluster (20 nodes) and have started
>>>> receiving the following error today:
>>>>
>>>> WARNING:  database "hcatalog" must be vacuumed within 42324912
>>>> transactions  (seg27 worker09.xxx.com:40000 pid=689628)
>>>> HINT:  To avoid a database shutdown, execute a full-database VACUUM in
>>>> "hcatalog".
>>>>
>>>> I found this Pivotal article:
>>>>
>>>> https://community.pivotal.io/s/article/About-XIDs-and-XID-Wraparound-in-Greenplum
>>>> which discusses xid wrap around and that led me to running a query
>>>> which identified tables in each database that need a vacuum freeze run
>>>> against it.
>>>>
>>>>
>>>> https://community.pivotal.io/s/article/Vacuuming-Old-Tables-in-a-Database-to-Reduce-its-Age
>>>>
>>>> The issue, however, is that it is complaining about the hcatalog
>>>> database which is not an accessible database as it was used to query data
>>>> in Hive at one point. We have since disabled Hive and no longer need that
>>>> external PXF link but I am unclear on how to remove it.
>>>>
>>>> This article shows how to detect database age and I can run the first
>>>> part but it is for Greenplum and I do not know the equivalent properties in
>>>> HAWQ for the case statement to work.
>>>>
>>>>
>>>> https://community.pivotal.io/s/article/How-to-check-database-age-in-Greenplum
>>>>
>>>> The results for the following sql statement are below:
>>>>
>>>> SELECT gp_segment_id, datname, age(datfrozenxid) age FROM pg_database
>>>> UNION ALL
>>>> SELECT gp_segment_id, datname, age(datfrozenxid) age FROM
>>>> gp_dist_random('pg_database');
>>>>
>>>>  gp_segment_id |  datname   |    age
>>>> ---------------+------------+------------
>>>>             -1 | hcatalog   |  212776719
>>>>             -1 | template1  |      39104
>>>>             -1 | template0  |  212775841
>>>>             -1 | production |   10604565
>>>>             -1 | postgres   |      39104
>>>>             -1 | gpadmin    |      39104
>>>>              0 | template1  | 2100536422
>>>>              0 | hcatalog   | 2100537300
>>>> (8 rows)
>>>>
>>>> I cannot access hcatalog or template1 to issue vacuum freeze statements
>>>> against the tables and am unsure how to proceed.
>>>>
>>>> Any help is greatly appreciated!
>>>>
>>>> John Glorioso
>>>> Lead Engineer
>>>> john@staq.com
>>>>
>>>>
>>>>
>

Re: wrap around xid warnings

Posted by Aitor Cedres <ac...@pivotal.io>.
Hi John,

The problem seems to be that database 'hcatalog' has connections 
disabled in pg_database. If you run 'select * from pg_database', the 
column 'datallowconn' should be 'f'. I think a regular user is not 
allowed to make modification to the system tables. There was a hidden 
GUC to allow DML operations on the catalog, but I can't remember from 
the top of my head.

If you manage to get DML modification on the catalog, running something 
like `UPDATE pg_database SET datallowconn = 't' where datname = 
'hcatalog'` should allow you to access the database and run VACUUM FREEZE.

Best regards,
Aitor


On 17/10/2018 20:26, John Glorioso wrote:
> Digging deeper on this, I found that HAWQ has been trying to auto 
> vacuum hcatalog for months now with the following error:
>
> 2018-09-16 14:19:01.646399 
> UTC,"xxx","yyy",p252354,th493275424,"11.22.33.44","58083",2018-09-16 
> 14:18:57 
> UTC,0,con433331,cmd114,seg0,slice1,,,sx1,"LOG","00000","GetNewTransactionId: 
> requesting autovac (xid 1932591104 xidVacLimit 
> 200000003)",,,,,,"select 
> Ta.client_commissioned_measurement_methodology as v, 
> count(Ta.client_commissioned_measurement_methodology)::float4/588.000000::float4 
> as f from my_schema.my_table as Ta where 
> Ta.client_commissioned_measurement_methodology is not null group by 
> (Ta.client_commissioned_measurement_methodology) order by f desc limit 
> 25",0,,"varsup.c",98,
> 2018-09-16 14:19:01.663978 
> UTC,,,p252782,th493275424,,,,0,,,seg-10000,,,,,"ERROR","3D000","""hcatalog"" 
> database is only for system use",,,,,,,0,,"postinit.c",386,
>
> hcatalog being a system database (and one that isn't even real 
> technically) seems to be causing the issue. I have successfully 
> vacuumed the others.
>
> John Glorioso
> Lead Engineer
> john@staq.com <ma...@staq.com>
>
>
>
>
> On Wed, Oct 17, 2018 at 9:19 AM John Glorioso <john@staq.com 
> <ma...@staq.com>> wrote:
>
>     I believe the issue is the FREEZE vacuum option. Is there a
>     setting in config that I can verify? Better yet since we no longer
>     use hcatalog to connect to hive is there a way to just simply
>     disable or clear datfrozenxid? I don't imagine dropping hcatalog
>     would work and I am hesitant to try to manually update that value
>     to null.
>
>     -John
>
>     On Wed, Oct 17, 2018, 1:49 AM Paul Guo <paulguo@gmail.com
>     <ma...@gmail.com>> wrote:
>
>         Looks like HAWQ should do auto-vacuum for those databases
>         which DBA can not connect?
>
>
>         John Glorioso <john@staq.com <ma...@staq.com>>
>         于2018年10月17日周三 上午6:32写道:
>
>             We are running a decent sized cluster (20 nodes) and have
>             started receiving the following error today:
>
>             WARNING:  database "hcatalog" must be vacuumed within
>             42324912 transactions (seg27 worker09.xxx.com:40000
>             <http://worker09.xxx.com:40000> pid=689628)
>             HINT:  To avoid a database shutdown, execute a
>             full-database VACUUM in "hcatalog".
>
>             I found this Pivotal article:
>             https://community.pivotal.io/s/article/About-XIDs-and-XID-Wraparound-in-Greenplum
>             which discusses xid wrap around and that led me to running
>             a query which identified tables in each database that need
>             a vacuum freeze run against it.
>
>             https://community.pivotal.io/s/article/Vacuuming-Old-Tables-in-a-Database-to-Reduce-its-Age
>
>             The issue, however, is that it is complaining about the
>             hcatalog database which is not an accessible database as
>             it was used to query data in Hive at one point. We have
>             since disabled Hive and no longer need that external PXF
>             link but I am unclear on how to remove it.
>
>             This article shows how to detect database age and I can
>             run the first part but it is for Greenplum and I do not
>             know the equivalent properties in HAWQ for the case
>             statement to work.
>
>             https://community.pivotal.io/s/article/How-to-check-database-age-in-Greenplum
>
>             The results for the following sql statement are below:
>
>             SELECT gp_segment_id, datname, age(datfrozenxid) age FROM
>             pg_database
>             UNION ALL
>             SELECT gp_segment_id, datname, age(datfrozenxid) age FROM
>             gp_dist_random('pg_database');
>
>              gp_segment_id |  datname   |    age
>             ---------------+------------+------------
>                         -1 | hcatalog   | 212776719
>                         -1 | template1  | 39104
>                         -1 | template0  | 212775841
>                         -1 | production |  10604565
>                         -1 | postgres   | 39104
>                         -1 | gpadmin    | 39104
>                          0 | template1  | 2100536422
>                          0 | hcatalog   | 2100537300
>             (8 rows)
>
>             I cannot access hcatalog or template1 to issue vacuum
>             freeze statements against the tables and am unsure how to
>             proceed.
>
>             Any help is greatly appreciated!
>
>             John Glorioso
>             Lead Engineer
>             john@staq.com <ma...@staq.com>
>
>


Re: wrap around xid warnings

Posted by John Glorioso <jo...@staq.com>.
Digging deeper on this, I found that HAWQ has been trying to auto vacuum
hcatalog for months now with the following error:

2018-09-16 14:19:01.646399
UTC,"xxx","yyy",p252354,th493275424,"11.22.33.44","58083",2018-09-16
14:18:57
UTC,0,con433331,cmd114,seg0,slice1,,,sx1,"LOG","00000","GetNewTransactionId:
requesting autovac (xid 1932591104 xidVacLimit 200000003)",,,,,,"select
Ta.client_commissioned_measurement_methodology as v,
count(Ta.client_commissioned_measurement_methodology)::float4/588.000000::float4
as f from my_schema.my_table as Ta where
Ta.client_commissioned_measurement_methodology is not null group by
(Ta.client_commissioned_measurement_methodology) order by f desc limit
25",0,,"varsup.c",98,
2018-09-16 14:19:01.663978
UTC,,,p252782,th493275424,,,,0,,,seg-10000,,,,,"ERROR","3D000","""hcatalog""
database is only for system use",,,,,,,0,,"postinit.c",386,

hcatalog being a system database (and one that isn't even real technically)
seems to be causing the issue. I have successfully vacuumed the others.

John Glorioso
Lead Engineer
john@staq.com




On Wed, Oct 17, 2018 at 9:19 AM John Glorioso <jo...@staq.com> wrote:

> I believe the issue is the FREEZE vacuum option. Is there a setting in
> config that I can verify? Better yet since we no longer use hcatalog to
> connect to hive is there a way to just simply disable or clear
> datfrozenxid? I don't imagine dropping hcatalog would work and I am
> hesitant to try to manually update that value to null.
>
> -John
>
> On Wed, Oct 17, 2018, 1:49 AM Paul Guo <pa...@gmail.com> wrote:
>
>> Looks like HAWQ should do auto-vacuum for those databases which DBA can
>> not connect?
>>
>>
>> John Glorioso <jo...@staq.com> 于2018年10月17日周三 上午6:32写道:
>>
>>> We are running a decent sized cluster (20 nodes) and have started
>>> receiving the following error today:
>>>
>>> WARNING:  database "hcatalog" must be vacuumed within 42324912
>>> transactions  (seg27 worker09.xxx.com:40000 pid=689628)
>>> HINT:  To avoid a database shutdown, execute a full-database VACUUM in
>>> "hcatalog".
>>>
>>> I found this Pivotal article:
>>>
>>> https://community.pivotal.io/s/article/About-XIDs-and-XID-Wraparound-in-Greenplum
>>> which discusses xid wrap around and that led me to running a query which
>>> identified tables in each database that need a vacuum freeze run against it.
>>>
>>>
>>> https://community.pivotal.io/s/article/Vacuuming-Old-Tables-in-a-Database-to-Reduce-its-Age
>>>
>>> The issue, however, is that it is complaining about the hcatalog
>>> database which is not an accessible database as it was used to query data
>>> in Hive at one point. We have since disabled Hive and no longer need that
>>> external PXF link but I am unclear on how to remove it.
>>>
>>> This article shows how to detect database age and I can run the first
>>> part but it is for Greenplum and I do not know the equivalent properties in
>>> HAWQ for the case statement to work.
>>>
>>>
>>> https://community.pivotal.io/s/article/How-to-check-database-age-in-Greenplum
>>>
>>> The results for the following sql statement are below:
>>>
>>> SELECT gp_segment_id, datname, age(datfrozenxid) age FROM pg_database
>>> UNION ALL
>>> SELECT gp_segment_id, datname, age(datfrozenxid) age FROM
>>> gp_dist_random('pg_database');
>>>
>>>  gp_segment_id |  datname   |    age
>>> ---------------+------------+------------
>>>             -1 | hcatalog   |  212776719
>>>             -1 | template1  |      39104
>>>             -1 | template0  |  212775841
>>>             -1 | production |   10604565
>>>             -1 | postgres   |      39104
>>>             -1 | gpadmin    |      39104
>>>              0 | template1  | 2100536422
>>>              0 | hcatalog   | 2100537300
>>> (8 rows)
>>>
>>> I cannot access hcatalog or template1 to issue vacuum freeze statements
>>> against the tables and am unsure how to proceed.
>>>
>>> Any help is greatly appreciated!
>>>
>>> John Glorioso
>>> Lead Engineer
>>> john@staq.com
>>>
>>>
>>>

Re: wrap around xid warnings

Posted by John Glorioso <jo...@staq.com>.
I believe the issue is the FREEZE vacuum option. Is there a setting in
config that I can verify? Better yet since we no longer use hcatalog to
connect to hive is there a way to just simply disable or clear
datfrozenxid? I don't imagine dropping hcatalog would work and I am
hesitant to try to manually update that value to null.

-John

On Wed, Oct 17, 2018, 1:49 AM Paul Guo <pa...@gmail.com> wrote:

> Looks like HAWQ should do auto-vacuum for those databases which DBA can
> not connect?
>
>
> John Glorioso <jo...@staq.com> 于2018年10月17日周三 上午6:32写道:
>
>> We are running a decent sized cluster (20 nodes) and have started
>> receiving the following error today:
>>
>> WARNING:  database "hcatalog" must be vacuumed within 42324912
>> transactions  (seg27 worker09.xxx.com:40000 pid=689628)
>> HINT:  To avoid a database shutdown, execute a full-database VACUUM in
>> "hcatalog".
>>
>> I found this Pivotal article:
>>
>> https://community.pivotal.io/s/article/About-XIDs-and-XID-Wraparound-in-Greenplum
>> which discusses xid wrap around and that led me to running a query which
>> identified tables in each database that need a vacuum freeze run against it.
>>
>>
>> https://community.pivotal.io/s/article/Vacuuming-Old-Tables-in-a-Database-to-Reduce-its-Age
>>
>> The issue, however, is that it is complaining about the hcatalog database
>> which is not an accessible database as it was used to query data in Hive at
>> one point. We have since disabled Hive and no longer need that external PXF
>> link but I am unclear on how to remove it.
>>
>> This article shows how to detect database age and I can run the first
>> part but it is for Greenplum and I do not know the equivalent properties in
>> HAWQ for the case statement to work.
>>
>>
>> https://community.pivotal.io/s/article/How-to-check-database-age-in-Greenplum
>>
>> The results for the following sql statement are below:
>>
>> SELECT gp_segment_id, datname, age(datfrozenxid) age FROM pg_database
>> UNION ALL
>> SELECT gp_segment_id, datname, age(datfrozenxid) age FROM
>> gp_dist_random('pg_database');
>>
>>  gp_segment_id |  datname   |    age
>> ---------------+------------+------------
>>             -1 | hcatalog   |  212776719
>>             -1 | template1  |      39104
>>             -1 | template0  |  212775841
>>             -1 | production |   10604565
>>             -1 | postgres   |      39104
>>             -1 | gpadmin    |      39104
>>              0 | template1  | 2100536422
>>              0 | hcatalog   | 2100537300
>> (8 rows)
>>
>> I cannot access hcatalog or template1 to issue vacuum freeze statements
>> against the tables and am unsure how to proceed.
>>
>> Any help is greatly appreciated!
>>
>> John Glorioso
>> Lead Engineer
>> john@staq.com
>>
>>
>>

Re: wrap around xid warnings

Posted by Paul Guo <pa...@gmail.com>.
Looks like HAWQ should do auto-vacuum for those databases which DBA can not
connect?


John Glorioso <jo...@staq.com> 于2018年10月17日周三 上午6:32写道:

> We are running a decent sized cluster (20 nodes) and have started
> receiving the following error today:
>
> WARNING:  database "hcatalog" must be vacuumed within 42324912
> transactions  (seg27 worker09.xxx.com:40000 pid=689628)
> HINT:  To avoid a database shutdown, execute a full-database VACUUM in
> "hcatalog".
>
> I found this Pivotal article:
>
> https://community.pivotal.io/s/article/About-XIDs-and-XID-Wraparound-in-Greenplum
> which discusses xid wrap around and that led me to running a query which
> identified tables in each database that need a vacuum freeze run against it.
>
>
> https://community.pivotal.io/s/article/Vacuuming-Old-Tables-in-a-Database-to-Reduce-its-Age
>
> The issue, however, is that it is complaining about the hcatalog database
> which is not an accessible database as it was used to query data in Hive at
> one point. We have since disabled Hive and no longer need that external PXF
> link but I am unclear on how to remove it.
>
> This article shows how to detect database age and I can run the first part
> but it is for Greenplum and I do not know the equivalent properties in HAWQ
> for the case statement to work.
>
>
> https://community.pivotal.io/s/article/How-to-check-database-age-in-Greenplum
>
> The results for the following sql statement are below:
>
> SELECT gp_segment_id, datname, age(datfrozenxid) age FROM pg_database
> UNION ALL
> SELECT gp_segment_id, datname, age(datfrozenxid) age FROM
> gp_dist_random('pg_database');
>
>  gp_segment_id |  datname   |    age
> ---------------+------------+------------
>             -1 | hcatalog   |  212776719
>             -1 | template1  |      39104
>             -1 | template0  |  212775841
>             -1 | production |   10604565
>             -1 | postgres   |      39104
>             -1 | gpadmin    |      39104
>              0 | template1  | 2100536422
>              0 | hcatalog   | 2100537300
> (8 rows)
>
> I cannot access hcatalog or template1 to issue vacuum freeze statements
> against the tables and am unsure how to proceed.
>
> Any help is greatly appreciated!
>
> John Glorioso
> Lead Engineer
> john@staq.com
>
>
>