You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Dhruba Borthakur <dh...@gmail.com> on 2011/01/03 06:36:57 UTC

Re: Profiling HDFS

when a datanode dies, any write pipeline that was using that datanode gets
affected to a certain extent. The writer goes through an error recovery
protocol that could introduce delays in the write pipeline. On the other
hand, other write pipelines that do not encompass the dead datanode should
not be impacted at all.

thanks
dhruba


On Wed, Dec 29, 2010 at 2:57 AM, Rajat Goel <ra...@gmail.com> wrote:

> I am opening a new file every 5 mins. For every 5 mins, I keep writing to a
> file, then I close the current file and open a new file for writing. My
> block size is 256 MB. Replication factor is 2.
>
> This is my test scenario: I am using a cluster of 6 machines (1 namenode, 5
> datanodes). On each datanode, I am running two threads (one writing to HDFS
> @ 10MB/s and other reading from HDFS @ 20 MB/s.) I shutdown one of the
> datanodes manually and I see that my write thread on live datanodes is no
> longer able to write @10 MB/s to HDFS, write speed becomes slow.The problem
> is writes on live datanodes get affected by a datanode going dead.
>
> I suspect that this may be due to live nodes trying to replicate their
> blocks on dead datanode. I see java.io exceptions on terminal of live
> datanodes saying bad ack from the dead machine.
>
> Can you please tell us what how exactly writes and replication behave when
> a datanode goes down?
>
> Regards,
> Rajat
>
>
> On Wed, Dec 29, 2010 at 11:17 AM, Dhruba Borthakur <dh...@gmail.com>wrote:
>
>> how frequently do you open new files to write? Or do you continue to write
>> to the same file(s) for the entire duration of the test? what is ur block
>> size? can you pl elaborate on your test workload?
>>
>>
>> On Tue, Dec 28, 2010 at 9:45 PM, Rajat Goel <ra...@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> I want to measure read/write rates to HDFS under various conditions such
>>> as under heavy load or one data node goes down etc? Is there some profiler
>>> already available for such purpose?
>>>
>>> I am pushing data at high rate to HDFS, reads are also happening in
>>> parallel and I suddenly reboot one datanode. I observe that I am no longer
>>> able to write to HDFS (from live datanodes) at the same higher rate. This
>>> happens for few minutes (around 30 mins), after which things go back to
>>> normal again. I want to find out why HDFS becomes slow, what is the main
>>> contributor of this latency and can I improve this behavior by changing some
>>> configuration parameters.
>>>
>>> Thanks & Regards,
>>> Rajat
>>>
>>
>>
>>
>> --
>> Connect to me at http://www.facebook.com/dhruba
>>
>
>


-- 
Connect to me at http://www.facebook.com/dhruba

Re: Profiling HDFS

Posted by Rajat Goel <ra...@gmail.com>.

Hi Dhruba,

Can you please explain a bit more about this error recovery protocol and the
delay that could be introduced? Can we control this delay via some
configuration parameters of HDFS? I already tried setting
dfs.client.block.write.retries to 0 and
dfs.namenode.heartbeat.recheck-interval to 1000.

Thanks,
Rajat

On Mon, Jan 3, 2011 at 11:06 AM, Dhruba Borthakur <dh...@gmail.com> wrote:

> when a datanode dies, any write pipeline that was using that datanode gets
> affected to a certain extent. The writer goes through an error recovery
> protocol that could introduce delays in the write pipeline. On the other
> hand, other write pipelines that do not encompass the dead datanode should
> not be impacted at all.
>
> thanks
> dhruba
>
>
>
> On Wed, Dec 29, 2010 at 2:57 AM, Rajat Goel <ra...@gmail.com> wrote:
>
>> I am opening a new file every 5 mins. For every 5 mins, I keep writing to
>> a file, then I close the current file and open a new file for writing. My
>> block size is 256 MB. Replication factor is 2.
>>
>> This is my test scenario: I am using a cluster of 6 machines (1 namenode,
>> 5 datanodes). On each datanode, I am running two threads (one writing to
>> HDFS @ 10MB/s and other reading from HDFS @ 20 MB/s.) I shutdown one of the
>> datanodes manually and I see that my write thread on live datanodes is no
>> longer able to write @10 MB/s to HDFS, write speed becomes slow.The problem
>> is writes on live datanodes get affected by a datanode going dead.
>>
>> I suspect that this may be due to live nodes trying to replicate their
>> blocks on dead datanode. I see java.io exceptions on terminal of live
>> datanodes saying bad ack from the dead machine.
>>
>> Can you please tell us what how exactly writes and replication behave when
>> a datanode goes down?
>>
>> Regards,
>> Rajat
>>
>>
>> On Wed, Dec 29, 2010 at 11:17 AM, Dhruba Borthakur <dh...@gmail.com>wrote:
>>
>>> how frequently do you open new files to write? Or do you continue to
>>> write to the same file(s) for the entire duration of the test? what is ur
>>> block size? can you pl elaborate on your test workload?
>>>
>>>
>>> On Tue, Dec 28, 2010 at 9:45 PM, Rajat Goel <ra...@gmail.com>wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to measure read/write rates to HDFS under various conditions such
>>>> as under heavy load or one data node goes down etc? Is there some profiler
>>>> already available for such purpose?
>>>>
>>>> I am pushing data at high rate to HDFS, reads are also happening in
>>>> parallel and I suddenly reboot one datanode. I observe that I am no longer
>>>> able to write to HDFS (from live datanodes) at the same higher rate. This
>>>> happens for few minutes (around 30 mins), after which things go back to
>>>> normal again. I want to find out why HDFS becomes slow, what is the main
>>>> contributor of this latency and can I improve this behavior by changing some
>>>> configuration parameters.
>>>>
>>>> Thanks & Regards,
>>>> Rajat
>>>>
>>>
>>>
>>>
>>> --
>>> Connect to me at http://www.facebook.com/dhruba
>>>
>>
>>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>