You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Davey Yan <da...@gmail.com> on 2013/08/02 03:55:25 UTC

DataBlockScanner's rate limit

I recently got a mini cluster corrupted after my inappropriate process.

This mini cluster's dfs.replication was set to 1.
After irregular restart of OS, I cannot wait to leave safemode, the block
ratio is 0.9862, < 0.999.
In the http://ip:50075/blockScannerReport, I notice there is rate limit to
1MB.
It will verify the blocks for long time.

So I "hadoop dfdsadmin safemode leave", and then I got blocks missing.

My question is: Why should we limit the rate in DataBlockScanner while the
cluster is still starting up or still in safemode?

I read the source code of DataBlockScanner.java, there is no parameter to
change the rate limit.
It seams to be 1MB to 8MB always.


-- 
Davey Yan

Re: DataBlockScanner's rate limit

Posted by Radim Kolar <hs...@filez.com>.

Another questions: will a single replication factor offen lead to block 
missing?
yes

Re: DataBlockScanner's rate limit

Posted by Radim Kolar <hs...@filez.com>.

Another questions: will a single replication factor offen lead to block 
missing?
yes

Re: DataBlockScanner's rate limit

Posted by Radim Kolar <hs...@filez.com>.

Another questions: will a single replication factor offen lead to block 
missing?
yes

Re: DataBlockScanner's rate limit

Posted by Radim Kolar <hs...@filez.com>.

Another questions: will a single replication factor offen lead to block 
missing?
yes

Re: DataBlockScanner's rate limit

Posted by Davey Yan <da...@gmail.com>.

Hi Harsh, thanks for reply.

Yes, dfs.replication was set to 1, but no missing mount.
Another questions: will a single replication factor offen lead to block
missing?
After the startup, the ratio reported in admin ui, e.g. 0.9826, will not
change?  Even the DataBlockScanner is still running?




On Fri, Aug 2, 2013 at 11:27 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi,
>
> The DataBlockScanner isn't responsible for the DN block reports at
> startup, which is a wholly different thread/process - it is a NN
> independent operation that merely verifies blocks in the background
> for the DN's own health. Depending on what the outage caused, it is
> likely that you are missing a mount and perhaps blocks of files with a
> single replica. Run an fsck to identify what files these are and if
> they used a single replication factor?
>
> On Fri, Aug 2, 2013 at 7:25 AM, Davey Yan <da...@gmail.com> wrote:
> > I recently got a mini cluster corrupted after my inappropriate process.
> >
> > This mini cluster's dfs.replication was set to 1.
> > After irregular restart of OS, I cannot wait to leave safemode, the block
> > ratio is 0.9862, < 0.999.
> > In the http://ip:50075/blockScannerReport, I notice there is rate limit
> to
> > 1MB.
> > It will verify the blocks for long time.
> >
> > So I "hadoop dfdsadmin safemode leave", and then I got blocks missing.
> >
> > My question is: Why should we limit the rate in DataBlockScanner while
> the
> > cluster is still starting up or still in safemode?
> >
> > I read the source code of DataBlockScanner.java, there is no parameter to
> > change the rate limit.
> > It seams to be 1MB to 8MB always.
> >
> >
> > --
> > Davey Yan
>
>
>
> --
> Harsh J
>



-- 
Davey Yan

Re: DataBlockScanner's rate limit

Posted by Davey Yan <da...@gmail.com>.

Hi Harsh, thanks for reply.

Yes, dfs.replication was set to 1, but no missing mount.
Another questions: will a single replication factor offen lead to block
missing?
After the startup, the ratio reported in admin ui, e.g. 0.9826, will not
change?  Even the DataBlockScanner is still running?




On Fri, Aug 2, 2013 at 11:27 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi,
>
> The DataBlockScanner isn't responsible for the DN block reports at
> startup, which is a wholly different thread/process - it is a NN
> independent operation that merely verifies blocks in the background
> for the DN's own health. Depending on what the outage caused, it is
> likely that you are missing a mount and perhaps blocks of files with a
> single replica. Run an fsck to identify what files these are and if
> they used a single replication factor?
>
> On Fri, Aug 2, 2013 at 7:25 AM, Davey Yan <da...@gmail.com> wrote:
> > I recently got a mini cluster corrupted after my inappropriate process.
> >
> > This mini cluster's dfs.replication was set to 1.
> > After irregular restart of OS, I cannot wait to leave safemode, the block
> > ratio is 0.9862, < 0.999.
> > In the http://ip:50075/blockScannerReport, I notice there is rate limit
> to
> > 1MB.
> > It will verify the blocks for long time.
> >
> > So I "hadoop dfdsadmin safemode leave", and then I got blocks missing.
> >
> > My question is: Why should we limit the rate in DataBlockScanner while
> the
> > cluster is still starting up or still in safemode?
> >
> > I read the source code of DataBlockScanner.java, there is no parameter to
> > change the rate limit.
> > It seams to be 1MB to 8MB always.
> >
> >
> > --
> > Davey Yan
>
>
>
> --
> Harsh J
>



-- 
Davey Yan

Re: DataBlockScanner's rate limit

Posted by Davey Yan <da...@gmail.com>.

Hi Harsh, thanks for reply.

Yes, dfs.replication was set to 1, but no missing mount.
Another questions: will a single replication factor offen lead to block
missing?
After the startup, the ratio reported in admin ui, e.g. 0.9826, will not
change?  Even the DataBlockScanner is still running?




On Fri, Aug 2, 2013 at 11:27 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi,
>
> The DataBlockScanner isn't responsible for the DN block reports at
> startup, which is a wholly different thread/process - it is a NN
> independent operation that merely verifies blocks in the background
> for the DN's own health. Depending on what the outage caused, it is
> likely that you are missing a mount and perhaps blocks of files with a
> single replica. Run an fsck to identify what files these are and if
> they used a single replication factor?
>
> On Fri, Aug 2, 2013 at 7:25 AM, Davey Yan <da...@gmail.com> wrote:
> > I recently got a mini cluster corrupted after my inappropriate process.
> >
> > This mini cluster's dfs.replication was set to 1.
> > After irregular restart of OS, I cannot wait to leave safemode, the block
> > ratio is 0.9862, < 0.999.
> > In the http://ip:50075/blockScannerReport, I notice there is rate limit
> to
> > 1MB.
> > It will verify the blocks for long time.
> >
> > So I "hadoop dfdsadmin safemode leave", and then I got blocks missing.
> >
> > My question is: Why should we limit the rate in DataBlockScanner while
> the
> > cluster is still starting up or still in safemode?
> >
> > I read the source code of DataBlockScanner.java, there is no parameter to
> > change the rate limit.
> > It seams to be 1MB to 8MB always.
> >
> >
> > --
> > Davey Yan
>
>
>
> --
> Harsh J
>



-- 
Davey Yan

Re: DataBlockScanner's rate limit

Posted by Davey Yan <da...@gmail.com>.

Hi Harsh, thanks for reply.

Yes, dfs.replication was set to 1, but no missing mount.
Another questions: will a single replication factor offen lead to block
missing?
After the startup, the ratio reported in admin ui, e.g. 0.9826, will not
change?  Even the DataBlockScanner is still running?




On Fri, Aug 2, 2013 at 11:27 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi,
>
> The DataBlockScanner isn't responsible for the DN block reports at
> startup, which is a wholly different thread/process - it is a NN
> independent operation that merely verifies blocks in the background
> for the DN's own health. Depending on what the outage caused, it is
> likely that you are missing a mount and perhaps blocks of files with a
> single replica. Run an fsck to identify what files these are and if
> they used a single replication factor?
>
> On Fri, Aug 2, 2013 at 7:25 AM, Davey Yan <da...@gmail.com> wrote:
> > I recently got a mini cluster corrupted after my inappropriate process.
> >
> > This mini cluster's dfs.replication was set to 1.
> > After irregular restart of OS, I cannot wait to leave safemode, the block
> > ratio is 0.9862, < 0.999.
> > In the http://ip:50075/blockScannerReport, I notice there is rate limit
> to
> > 1MB.
> > It will verify the blocks for long time.
> >
> > So I "hadoop dfdsadmin safemode leave", and then I got blocks missing.
> >
> > My question is: Why should we limit the rate in DataBlockScanner while
> the
> > cluster is still starting up or still in safemode?
> >
> > I read the source code of DataBlockScanner.java, there is no parameter to
> > change the rate limit.
> > It seams to be 1MB to 8MB always.
> >
> >
> > --
> > Davey Yan
>
>
>
> --
> Harsh J
>



-- 
Davey Yan

Re: DataBlockScanner's rate limit

Posted by Harsh J <ha...@cloudera.com>.

Hi,

The DataBlockScanner isn't responsible for the DN block reports at
startup, which is a wholly different thread/process - it is a NN
independent operation that merely verifies blocks in the background
for the DN's own health. Depending on what the outage caused, it is
likely that you are missing a mount and perhaps blocks of files with a
single replica. Run an fsck to identify what files these are and if
they used a single replication factor?

On Fri, Aug 2, 2013 at 7:25 AM, Davey Yan <da...@gmail.com> wrote:
> I recently got a mini cluster corrupted after my inappropriate process.
>
> This mini cluster's dfs.replication was set to 1.
> After irregular restart of OS, I cannot wait to leave safemode, the block
> ratio is 0.9862, < 0.999.
> In the http://ip:50075/blockScannerReport, I notice there is rate limit to
> 1MB.
> It will verify the blocks for long time.
>
> So I "hadoop dfdsadmin safemode leave", and then I got blocks missing.
>
> My question is: Why should we limit the rate in DataBlockScanner while the
> cluster is still starting up or still in safemode?
>
> I read the source code of DataBlockScanner.java, there is no parameter to
> change the rate limit.
> It seams to be 1MB to 8MB always.
>
>
> --
> Davey Yan

-- 
Harsh J

Re: DataBlockScanner's rate limit

Posted by Harsh J <ha...@cloudera.com>.

Hi,

The DataBlockScanner isn't responsible for the DN block reports at
startup, which is a wholly different thread/process - it is a NN
independent operation that merely verifies blocks in the background
for the DN's own health. Depending on what the outage caused, it is
likely that you are missing a mount and perhaps blocks of files with a
single replica. Run an fsck to identify what files these are and if
they used a single replication factor?

On Fri, Aug 2, 2013 at 7:25 AM, Davey Yan <da...@gmail.com> wrote:
> I recently got a mini cluster corrupted after my inappropriate process.
>
> This mini cluster's dfs.replication was set to 1.
> After irregular restart of OS, I cannot wait to leave safemode, the block
> ratio is 0.9862, < 0.999.
> In the http://ip:50075/blockScannerReport, I notice there is rate limit to
> 1MB.
> It will verify the blocks for long time.
>
> So I "hadoop dfdsadmin safemode leave", and then I got blocks missing.
>
> My question is: Why should we limit the rate in DataBlockScanner while the
> cluster is still starting up or still in safemode?
>
> I read the source code of DataBlockScanner.java, there is no parameter to
> change the rate limit.
> It seams to be 1MB to 8MB always.
>
>
> --
> Davey Yan

-- 
Harsh J

Re: DataBlockScanner's rate limit

Posted by Harsh J <ha...@cloudera.com>.

Hi,

The DataBlockScanner isn't responsible for the DN block reports at
startup, which is a wholly different thread/process - it is a NN
independent operation that merely verifies blocks in the background
for the DN's own health. Depending on what the outage caused, it is
likely that you are missing a mount and perhaps blocks of files with a
single replica. Run an fsck to identify what files these are and if
they used a single replication factor?

On Fri, Aug 2, 2013 at 7:25 AM, Davey Yan <da...@gmail.com> wrote:
> I recently got a mini cluster corrupted after my inappropriate process.
>
> This mini cluster's dfs.replication was set to 1.
> After irregular restart of OS, I cannot wait to leave safemode, the block
> ratio is 0.9862, < 0.999.
> In the http://ip:50075/blockScannerReport, I notice there is rate limit to
> 1MB.
> It will verify the blocks for long time.
>
> So I "hadoop dfdsadmin safemode leave", and then I got blocks missing.
>
> My question is: Why should we limit the rate in DataBlockScanner while the
> cluster is still starting up or still in safemode?
>
> I read the source code of DataBlockScanner.java, there is no parameter to
> change the rate limit.
> It seams to be 1MB to 8MB always.
>
>
> --
> Davey Yan

-- 
Harsh J

Re: DataBlockScanner's rate limit

Posted by Harsh J <ha...@cloudera.com>.

Hi,

The DataBlockScanner isn't responsible for the DN block reports at
startup, which is a wholly different thread/process - it is a NN
independent operation that merely verifies blocks in the background
for the DN's own health. Depending on what the outage caused, it is
likely that you are missing a mount and perhaps blocks of files with a
single replica. Run an fsck to identify what files these are and if
they used a single replication factor?

On Fri, Aug 2, 2013 at 7:25 AM, Davey Yan <da...@gmail.com> wrote:
> I recently got a mini cluster corrupted after my inappropriate process.
>
> This mini cluster's dfs.replication was set to 1.
> After irregular restart of OS, I cannot wait to leave safemode, the block
> ratio is 0.9862, < 0.999.
> In the http://ip:50075/blockScannerReport, I notice there is rate limit to
> 1MB.
> It will verify the blocks for long time.
>
> So I "hadoop dfdsadmin safemode leave", and then I got blocks missing.
>
> My question is: Why should we limit the rate in DataBlockScanner while the
> cluster is still starting up or still in safemode?
>
> I read the source code of DataBlockScanner.java, there is no parameter to
> change the rate limit.
> It seams to be 1MB to 8MB always.
>
>
> --
> Davey Yan

-- 
Harsh J