You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by Eitan Rosenfeld <ei...@gmail.com> on 2013/06/11 00:33:28 UTC

Block deletion after benchmarks

Hi all,

In my two-datanode cluster, I require that file operations on the
underlying filesystem take place in the same order.  Essentially, I wish
for blocks to be created, written, and/or deleted deterministically across
datanodes.

However, this is not the case towards the end of the TestDFSIO benchmark.
Several blocks are deleted, but each datanode performs this deletion at a
*different time* relative to the last few blocks being written.

What component is initiating the block deletion at the end of the
benchmark?

(It seems to be the Replication Monitor, but I'm unclear on what causes the
Replication Monitor to suddenly run and delete blocks at the end of the
benchmark).  I am using Hadoop 1.0.4.

Thank you,
Eitan Rosenfeld

Re: Block deletion after benchmarks

Posted by Harsh J <ha...@cloudera.com>.

Eitan,

I don't completely get your question. TestDFSIO is a test that will
create several files for testing the IO and then delete it at the end
of the test.

Block deletions in HDFS is an asynchronous process. File deletions are
instantaneous (as a transaction in the namespace) but the identified
block's deletions are progressively done over DN heartbeats and are
throttled (to avoid a storm of deletes from affecting DN memory
usage). You can look at dfs.namenode.invalidate.work.pct.per.iteration
in http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
to control this to make it go faster, but am not sure I got your
question right. The test just uses FS APIs, the FS just has a
different data (not file) deletion behavior - the Test isn't
responsible for that.

On Tue, Jun 11, 2013 at 4:03 AM, Eitan Rosenfeld <ei...@gmail.com> wrote:
> Hi all,
>
> In my two-datanode cluster, I require that file operations on the
> underlying filesystem take place in the same order.  Essentially, I wish
> for blocks to be created, written, and/or deleted deterministically across
> datanodes.
>
> However, this is not the case towards the end of the TestDFSIO benchmark.
> Several blocks are deleted, but each datanode performs this deletion at a
> *different time* relative to the last few blocks being written.
>
> What component is initiating the block deletion at the end of the
> benchmark?
>
> (It seems to be the Replication Monitor, but I'm unclear on what causes the
> Replication Monitor to suddenly run and delete blocks at the end of the
> benchmark).  I am using Hadoop 1.0.4.
>
> Thank you,
> Eitan Rosenfeld

-- 
Harsh J