You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Ed Espino (JIRA)" <ji...@apache.org> on 2017/07/11 14:54:04 UTC

[jira] [Updated] (HAWQ-1498) Segments keep open file descriptors for deleted files

     [ https://issues.apache.org/jira/browse/HAWQ-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ed Espino updated HAWQ-1498:
----------------------------
    Fix Version/s:     (was: 2.2.0.0-incubating)
                   2.3.0.0-incubating

> Segments keep open file descriptors for deleted files
> -----------------------------------------------------
>
>                 Key: HAWQ-1498
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1498
>             Project: Apache HAWQ
>          Issue Type: Bug
>            Reporter: Harald Bögeholz
>            Assignee: Radar Lei
>             Fix For: 2.3.0.0-incubating
>
>
> I have been running some large computations in HAWQ using psql on the master. These computations created temporary tables and dropped them again. Nevertheless free disk space in HDFS decreased by much more than it should. While the psql session on the master was still open I investigated on one of the slave machines.
> HDFS is stored on /mds:
> {noformat}
> [root@mds-hdp-04 ~]# ls -l /mds
> total 36
> drwxr-xr-x. 3 root      root    4096 Jun 14 04:23 falcon
> drwxr-xr-x. 3 root      root    4096 Jun 14 04:42 hdfs
> drwx------. 2 root      root   16384 Jun  8 02:48 lost+found
> drwxr-xr-x. 5 storm     hadoop  4096 Jun 14 04:45 storm
> drwxr-xr-x. 4 root      root    4096 Jun 14 04:43 yarn
> drwxr-xr-x. 2 zookeeper hadoop  4096 Jun 14 04:39 zookeeper
> [root@mds-hdp-04 ~]# df /mds
> Filesystem     1K-blocks      Used Available Use% Mounted on
> /dev/vdc       515928320 314560220 175137316  65% /mds
> [root@mds-hdp-04 ~]# du -s /mds
> 89918952	/mds
> {noformat}
> Note that there is a more than 200 GB difference between the disk space used according to df and the sum of all files on that file system according to du.
> I have found the culprit to be several postgres processes running as gpadmin and holding open file descriptors to deleted files. Here are the first few:
> {noformat}
> [root@mds-hdp-04 ~]# lsof +L1 | grep /mds/hdfs | head -10
> postgres 665334 gpadmin   18r   REG 253,32 134217728     0  9438234 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922482 (deleted)
> postgres 665334 gpadmin   34r   REG 253,32     24488     0  9438114 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922398 (deleted)
> postgres 665334 gpadmin   35r   REG 253,32       199     0  9438115 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922398_187044.meta (deleted)
> postgres 665334 gpadmin   37r   REG 253,32 134217728     0  9438208 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922446 (deleted)
> postgres 665334 gpadmin   38r   REG 253,32   1048583     0  9438209 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922446_187092.meta (deleted)
> postgres 665334 gpadmin   39r   REG 253,32   1048583     0  9438235 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922482_187128.meta (deleted)
> postgres 665334 gpadmin   40r   REG 253,32 134217728     0  9438262 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922555 (deleted)
> postgres 665334 gpadmin   41r   REG 253,32   1048583     0  9438263 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922555_187201.meta (deleted)
> postgres 665334 gpadmin   42r   REG 253,32 134217728     0  9438285 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir194/blk_1073922602 (deleted)
> postgres 665334 gpadmin   43r   REG 253,32   1048583     0  9438286 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir194/blk_1073922602_187248.meta (deleted)
> {noformat}
> As soon I close the psql session on the master the disk space is freed on the slaves:
> {noformat}
> [root@mds-hdp-04 ~]# df /mds
> Filesystem     1K-blocks     Used Available Use% Mounted on
> /dev/vdc       515928320 89992720 399704816  19% /mds
> [root@mds-hdp-04 ~]# du -s /mds
> 89918952	/mds
> [root@mds-hdp-04 ~]# lsof +L1 | grep /mds/hdfs | head -10
> {noformat}
> I believe this to be a bug. At least for me it looks like a very undesirable behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)