You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by sravankumar <sr...@huawei.com> on 2011/01/13 18:32:45 UTC

Speeding up Data Deletion From Datanodes

Hi,

 

            I have gone through the file deletion flow and came to know that


Replication Monitor is responsible for File Deletions and these
configurations will affect the block deletion

 

INVALIDATE_WORK_PCT_PER_ITERATION 

BLOCK_INVALIDATE_CHUNK

 

                Can any one suggest how can we tune up these configurations
to speed up block deletion and the significance of  

INVALIDATE_WORK_PCT_PER_ITERATION constant which by default is 32.

 

                And also can we tune the heartbeat interval  based on the
cluster size.

Suppose it is 10 Node Cluster can some one suggest how can we tune up the
configurations. Is there any documentation

for the same regarding tuning up of configurations based on the cluster
usage.

 

Thanks & Regards,

Sravan kumar.

 


Re: Speeding up Data Deletion From Datanodes

Posted by Todd Lipcon <to...@cloudera.com>.
On Sun, Jan 16, 2011 at 8:39 AM, Mag Gam <ma...@gmail.com> wrote:

> I am curious now...
>
> If you have a cluster the size of 10, what should the heartbeat be set
> as? What about 100, 1000?
>

The heartbeat interval auto-tunes based on size of the cluster. I've never
seen anyone tune these settings (yet)


>
>
> I too am interested in tuning documentation.  For example, how much
> memory should we allocate to JVM? How much memory for namenode? etc...
>
>
My rough guide (with a decent fudge factor built in) is 1GB RAM on NN per
million files. This is obviously a rule of thumb since the amount of RAM
taken by a file depends on a lot of factors (number of blocks, length of
filename, etc) but as a rough guide it should give you a decent idea.

Konstantin Shvachko did some better analysis on this a few months back:

http://www.usenix.org/publications/login/2010-04/openpdfs/shvachko.pdf

His number is 0.6GB per 1M files, but not sure if that includes a "safety
factor".

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Speeding up Data Deletion From Datanodes

Posted by Mag Gam <ma...@gmail.com>.
I am curious now...

If you have a cluster the size of 10, what should the heartbeat be set
as? What about 100, 1000?


I too am interested in tuning documentation.  For example, how much
memory should we allocate to JVM? How much memory for namenode? etc...



On Thu, Jan 13, 2011 at 1:22 PM, Todd Lipcon <to...@cloudera.com> wrote:
> Hi Sravan,
> You may want to consider backporting HDFS-611 (or using CDH3b3 which
> includes this backport, if you aren't in the mood to patch yourself)
> -Todd
>
> On Thu, Jan 13, 2011 at 9:32 AM, sravankumar <sr...@huawei.com> wrote:
>>
>> Hi,
>>
>>
>>
>>             I have gone through the file deletion flow and came to know
>> that
>>
>> Replication Monitor is responsible for File Deletions and these
>> configurations will affect the block deletion
>>
>>
>>
>> INVALIDATE_WORK_PCT_PER_ITERATION
>>
>> BLOCK_INVALIDATE_CHUNK
>>
>>
>>
>>                 Can any one suggest how can we tune up these
>> configurations to speed up block deletion and the significance of
>>
>> INVALIDATE_WORK_PCT_PER_ITERATION constant which by default is 32.
>>
>>
>>
>>                 And also can we tune the heartbeat interval  based on the
>> cluster size.
>>
>> Suppose it is 10 Node Cluster can some one suggest how can we tune up the
>> configurations. Is there any documentation
>>
>> for the same regarding tuning up of configurations based on the cluster
>> usage.
>>
>>
>>
>> Thanks & Regards,
>>
>> Sravan kumar.
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Speeding up Data Deletion From Datanodes

Posted by Todd Lipcon <to...@cloudera.com>.
Hi Sravan,

You may want to consider backporting HDFS-611 (or using CDH3b3 which
includes this backport, if you aren't in the mood to patch yourself)

-Todd

On Thu, Jan 13, 2011 at 9:32 AM, sravankumar <sr...@huawei.com> wrote:

>  Hi,
>
>
>
>             I have gone through the file deletion flow and came to know
> that
>
> Replication Monitor is responsible for File Deletions and these
> configurations will affect the block deletion
>
>
>
> INVALIDATE_WORK_PCT_PER_ITERATION
>
> BLOCK_INVALIDATE_CHUNK
>
>
>
>                 Can any one suggest how can we tune up these configurations
> to speed up block deletion and the significance of
>
> INVALIDATE_WORK_PCT_PER_ITERATION constant which by default is 32.
>
>
>
>                 And also can we tune the heartbeat interval  based on the
> cluster size.
>
> Suppose it is 10 Node Cluster can some one suggest how can we tune up the
> configurations. Is there any documentation
>
> for the same regarding tuning up of configurations based on the cluster
> usage.
>
>
>
> Thanks & Regards,
>
> Sravan kumar.
>
>
>



-- 
Todd Lipcon
Software Engineer, Cloudera