You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Wei-Chiu Chuang <we...@cloudera.com.INVALID> on 2019/06/02 00:05:23 UTC

Re: Disk hot swap for data node while hbase use short-circuit

I think i found a similar bug report that matches your symptom: HDFS-12204
<https://issues.apache.org/jira/browse/HDFS-12204> (Dfsclient Do not close
file descriptor when using shortcircuit)

On Wed, May 29, 2019 at 11:37 PM Kang Minwoo <mi...@outlook.com>
wrote:

> I think these file opened for reads. because that block is finalized.
>
> ---
> ls -al /proc/regionserver_pid/fd
> 902 -> /data_path/current/finalized/~/blk_1 (deleted)
> 946 -> /data_path/current/finalized/~/blk_2 (deleted)
> 947 -> /data_path/current/finalized/~/blk_3.meta (deleted)
> ---
>
> I think it is not an HBase bug. This is because DFSClient checks stale fd
> when the fetch method invoked.
>
> Best regards,
> Minwoo Kang
>
> ________________________________________
> 보낸 사람: Wei-Chiu Chuang <we...@cloudera.com.INVALID>
> 보낸 날짜: 2019년 5월 29일 수요일 20:51
> 받는 사람: user@hbase.apache.org
> 제목: Re: Disk hot swap for data node while hbase use short-circuit
>
> Do you have a list of files that was being opened? I'd like to know if
> those are files opened for writes or for reads.
>
> If you are on the more recent version of Hadoop (2.8.0 and above),
> there's a HDFS command to interrupt ongoing writes to DataNodes (HDFS-9945
> <https://issues.apache.org/jira/browse/HDFS-9945>)
>
>
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfsadmin
> hdfs dfsadmin -evictWriters
>
> Looking at HDFS hotswap implementation, it looks like DataNode doesn't
> interrupt writers when a volume is removed. That sounds like a bug.
>
> On Tue, May 28, 2019 at 9:39 PM Kang Minwoo <mi...@outlook.com>
> wrote:
>
> > Hello, Users.
> >
> > I use JBOD for data node. Some times the disk in the data node has a
> > problem.
> >
> > The first time, I shut down all instance include data node and region
> > server in the machine that has a disk problem.
> > But It is not a good solution. So I improve the process.
> >
> > When I detect disk problem in the server. I just perform disk hot swap.
> >
> > But System administrator complains of some FD that still open so they
> > cannot remove the disk.
> > Regionserver has an FD, I use short circuit reads feature. (HBase version
> > 1.2.9)
> >
> > When we first met this issue, we force unmount disk and remount.
> > But after this process, kernel report error[1].
> >
> > So we avoid this issue. purge stale FD.
> >
> > I think this issue is common.
> > I want to know how hbase-users deal with this issue.
> >
> > Thank you very much for sharing your experience.
> >
> > Best regards,
> > Minwoo Kang
> >
> > [1]:
> >
> https://www.thegeekdiary.com/xfs_log_force-error-5-returned-xfs-error-centos-rhel-7/
> >
>

Re: Disk hot swap for data node while hbase use short-circuit

Posted by Josh Elser <el...@apache.org>.

Reminds me of https://issues.apache.org/jira/browse/HBASE-21915 too. 
Agree with Wei-Chiu that I'd start by ruling out HDFS issues first, and 
then start worrying about HBase issues :)

On 6/1/19 8:05 PM, Wei-Chiu Chuang wrote:
> I think i found a similar bug report that matches your symptom: HDFS-12204
> <https://issues.apache.org/jira/browse/HDFS-12204> (Dfsclient Do not close
> file descriptor when using shortcircuit)
> 
> On Wed, May 29, 2019 at 11:37 PM Kang Minwoo <mi...@outlook.com>
> wrote:
> 
>> I think these file opened for reads. because that block is finalized.
>>
>> ---
>> ls -al /proc/regionserver_pid/fd
>> 902 -> /data_path/current/finalized/~/blk_1 (deleted)
>> 946 -> /data_path/current/finalized/~/blk_2 (deleted)
>> 947 -> /data_path/current/finalized/~/blk_3.meta (deleted)
>> ---
>>
>> I think it is not an HBase bug. This is because DFSClient checks stale fd
>> when the fetch method invoked.
>>
>> Best regards,
>> Minwoo Kang
>>
>> ________________________________________
>> 보낸 사람: Wei-Chiu Chuang <we...@cloudera.com.INVALID>
>> 보낸 날짜: 2019년 5월 29일 수요일 20:51
>> 받는 사람: user@hbase.apache.org
>> 제목: Re: Disk hot swap for data node while hbase use short-circuit
>>
>> Do you have a list of files that was being opened? I'd like to know if
>> those are files opened for writes or for reads.
>>
>> If you are on the more recent version of Hadoop (2.8.0 and above),
>> there's a HDFS command to interrupt ongoing writes to DataNodes (HDFS-9945
>> <https://issues.apache.org/jira/browse/HDFS-9945>)
>>
>>
>> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfsadmin
>> hdfs dfsadmin -evictWriters
>>
>> Looking at HDFS hotswap implementation, it looks like DataNode doesn't
>> interrupt writers when a volume is removed. That sounds like a bug.
>>
>> On Tue, May 28, 2019 at 9:39 PM Kang Minwoo <mi...@outlook.com>
>> wrote:
>>
>>> Hello, Users.
>>>
>>> I use JBOD for data node. Some times the disk in the data node has a
>>> problem.
>>>
>>> The first time, I shut down all instance include data node and region
>>> server in the machine that has a disk problem.
>>> But It is not a good solution. So I improve the process.
>>>
>>> When I detect disk problem in the server. I just perform disk hot swap.
>>>
>>> But System administrator complains of some FD that still open so they
>>> cannot remove the disk.
>>> Regionserver has an FD, I use short circuit reads feature. (HBase version
>>> 1.2.9)
>>>
>>> When we first met this issue, we force unmount disk and remount.
>>> But after this process, kernel report error[1].
>>>
>>> So we avoid this issue. purge stale FD.
>>>
>>> I think this issue is common.
>>> I want to know how hbase-users deal with this issue.
>>>
>>> Thank you very much for sharing your experience.
>>>
>>> Best regards,
>>> Minwoo Kang
>>>
>>> [1]:
>>>
>> https://www.thegeekdiary.com/xfs_log_force-error-5-returned-xfs-error-centos-rhel-7/
>>>
>>
>