You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Todd Lipcon <to...@cloudera.com> on 2010/01/13 22:31:53 UTC

Re: Exponential performance decay when inserting large number of blocks

Hi Zlatin,

This is a very interesting test you've run, and certainly not expected
results. I know of many clusters happily chugging along with millions of
blocks, so problems at 400K are very strange. By any chance were you able to
collect profiling information from the NameNode while running this test?

That said, I hope you've set the block size to 1KB for the purpose of this
test and not because you expect to run that in production. Recommended block
sizes are at least 64MB and often 128MB or 256MB for larger clusters.

Thanks
-Todd

On Wed, Jan 13, 2010 at 1:21 PM, <Zl...@barclayscapital.com>wrote:

> Greetings,
>
> I am testing how HDFS scales with very large number of blocks.  I did
> the following setup:
>
> Set the default blocks size to 1KB
> Started 8 insert processes, each inserting a 16MB file
> Repeated the insert 3 times, keeping the already inserted files in HDFS
> Repeated the entire experiment on one cluster with 4 and another with 11
> identical datanodes (allocated through HOD)
>
> Results:
> The first 128MB (2^18 blocks) insert finished in 5 minutes.  The second
> in 12 minutes.  The third didn't finish within 1 hour.  The 11-node
> cluster was marginally faster.
>
> Throughout this I was storing all available metrics.  There were no
> signs of insufficient memory on any of the nodes; CPU usage and garbage
> collections were constant throughout.  If anyone is interested I can
> provide the recorded metrics.  I've attached a chart that looks clearly
> logarithmic.
>
> Can anyone please point to what could be the bottleneck here?  I'm
> evaluating HDFS for usage scenarios requiring 2^(a lot more than 18)
> blocks.
>
> Bes <<insertion_rate_4_and_11_datanodes.JPG>> t Regards,
> Zlatin Balevsky
>
> _______________________________________________
>
> This e-mail may contain information that is confidential, privileged or
> otherwise protected from disclosure. If you are not an intended recipient of
> this e-mail, do not duplicate or redistribute it by any means. Please delete
> it and any attachments and notify the sender that you have received it in
> error. Unless specifically indicated, this e-mail is not an offer to buy or
> sell or a solicitation to buy or sell any securities, investment products or
> other financial product or service, an official confirmation of any
> transaction, or an official statement of Barclays. Any views or opinions
> presented are solely those of the author and do not necessarily represent
> those of Barclays. This e-mail is subject to terms available at the
> following link: www.barcap.com/emaildisclaimer. By messaging with Barclays
> you consent to the foregoing.  Barclays Capital is the investment banking
> division of Barclays Bank PLC, a company registered in England (number
> 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
>  This email may relate to or be sent from other members of the Barclays
> Group.
> _______________________________________________
>

Re: Exponential performance decay - mystery solved

Posted by Eli Collins <el...@cloudera.com>.

Hey Zlatin,

That makes sense. No apologies necessary, was a very useful exercise.

Thanks,
Eli


On Thu, Jan 21, 2010 at 11:41 AM,  <Zl...@barclayscapital.com> wrote:
>
>
> Alright, the problem was caused by me setting the frequency of a block
> report to 30 seconds.  The idea behind that was to create more load on
> the Namenode, but I didn't notice that those block reports were taking
> increasing amounts of time to generate.  During that time, a lock was
> held which I'm guessing didn't allow the reporting datanode to perform
> its functions.
>
> On my hardware, with 100,000 blocks the report takes over 7 seconds.  So
> every datanode was unavailable for 7 out of every 30 seconds.  Changing
> the interval to a more reasonable value restored the insertion speed to
> linear.
>
> Apologies for creating this confusion, nevertheless it was a useful
> thing to learn.
>
> Regards,
> Zlatin
>
> -----Original Message-----
> From: Eli Collins [mailto:eli@cloudera.com]
> Sent: Thursday, January 21, 2010 2:02 PM
> To: hdfs-user@hadoop.apache.org
> Subject: Re: Exponential performance decay - possible lead
>
>>
>> The messages are of the following:
>>
>> 2010-01-18 14:51:25,694 WARN org.apache.hadoop.hdfs.StateChange:
>> BLOCK* NameSystem.addStoredBlock: Redundant addStoredBlock request
>> received for blk_-5804440919363539694_1026 on ip.removed:port.removed
>> size 1024
>
> This is odd, you should't be getting this warning, I don't see it when
> running your benchmark on my cluster. Are there other relevant/warnings
> errors in the NN or DN logs?
>
> Thanks,
> Eli
> _______________________________________________
>
> This e-mail may contain information that is confidential, privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. Unless specifically indicated, this e-mail is not an offer to buy or sell or a solicitation to buy or sell any securities, investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Barclays. Any views or opinions presented are solely those of the author and do not necessarily represent those of Barclays. This e-mail is subject to terms available at the following link: www.barcap.com/emaildisclaimer. By messaging with Barclays you consent to the foregoing.  Barclays Capital is the investment banking division of Barclays Bank PLC, a company registered in England (number 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.  This email may relate to or be sent from other members of the Barclays Group.
> _______________________________________________
>

RE: Exponential performance decay - mystery solved

Posted by Zl...@barclayscapital.com.

Happy to report this doesn't happen with 0.21 even with block report interval of 30 seconds.

Zlatin

________________________________
From: Raghu Angadi [mailto:rangadi@apache.org]
Sent: Thursday, January 21, 2010 7:19 PM
To: hdfs-user@hadoop.apache.org
Subject: Re: Exponential performance decay - mystery solved

http://issues.apache.org/jira/browse/HADOOP-4584 is supposed to fix this exact problem with the block reports. Were you running 0.21 or 0.20?

Raghu.

On Thu, Jan 21, 2010 at 11:41 AM, <Zl...@barclayscapital.com>> wrote:

Alright, the problem was caused by me setting the frequency of a block
report to 30 seconds.  The idea behind that was to create more load on
the Namenode, but I didn't notice that those block reports were taking
increasing amounts of time to generate.  During that time, a lock was
held which I'm guessing didn't allow the reporting datanode to perform
its functions.

On my hardware, with 100,000 blocks the report takes over 7 seconds.  So
every datanode was unavailable for 7 out of every 30 seconds.  Changing
the interval to a more reasonable value restored the insertion speed to
linear.

Apologies for creating this confusion, nevertheless it was a useful
thing to learn.

Regards,
Zlatin

-----Original Message-----
From: Eli Collins [mailto:eli@cloudera.com<ma...@cloudera.com>]
Sent: Thursday, January 21, 2010 2:02 PM
To: hdfs-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Exponential performance decay - possible lead

>
> The messages are of the following:
>
> 2010-01-18 14:51:25,694 WARN org.apache.hadoop.hdfs.StateChange:
> BLOCK* NameSystem.addStoredBlock: Redundant addStoredBlock request
> received for blk_-5804440919363539694_1026 on ip.removed:port.removed
> size 1024

This is odd, you should't be getting this warning, I don't see it when
running your benchmark on my cluster. Are there other relevant/warnings
errors in the NN or DN logs?

Thanks,
Eli
_______________________________________________

This e-mail may contain information that is confidential, privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. Unless specifically indicated, this e-mail is not an offer to buy or sell or a solicitation to buy or sell any securities, investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Barclays. Any views or opinions presented are solely those of the author and do not necessarily represent those of Barclays. This e-mail is subject to terms available at the following link: www.barcap.com/emaildisclaimer<http://www.barcap.com/emaildisclaimer>. By messaging with Barclays you consent to the foregoing.  Barclays Capital is the investment banking division of Barclays Bank PLC, a company registered in England (number 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.  This email may relate to or be sent from other members of the Barclays Group.
_______________________________________________

Re: Exponential performance decay - mystery solved

Posted by Raghu Angadi <ra...@apache.org>.

http://issues.apache.org/jira/browse/HADOOP-4584 is supposed to fix this
exact problem with the block reports. Were you running 0.21 or 0.20?

Raghu.

On Thu, Jan 21, 2010 at 11:41 AM, <Zl...@barclayscapital.com>wrote:

>
>
> Alright, the problem was caused by me setting the frequency of a block
> report to 30 seconds.  The idea behind that was to create more load on
> the Namenode, but I didn't notice that those block reports were taking
> increasing amounts of time to generate.  During that time, a lock was
> held which I'm guessing didn't allow the reporting datanode to perform
> its functions.
>
> On my hardware, with 100,000 blocks the report takes over 7 seconds.  So
> every datanode was unavailable for 7 out of every 30 seconds.  Changing
> the interval to a more reasonable value restored the insertion speed to
> linear.
>
> Apologies for creating this confusion, nevertheless it was a useful
> thing to learn.
>
> Regards,
> Zlatin
>
> -----Original Message-----
> From: Eli Collins [mailto:eli@cloudera.com]
> Sent: Thursday, January 21, 2010 2:02 PM
> To: hdfs-user@hadoop.apache.org
> Subject: Re: Exponential performance decay - possible lead
>
> >
> > The messages are of the following:
> >
> > 2010-01-18 14:51:25,694 WARN org.apache.hadoop.hdfs.StateChange:
> > BLOCK* NameSystem.addStoredBlock: Redundant addStoredBlock request
> > received for blk_-5804440919363539694_1026 on ip.removed:port.removed
> > size 1024
>
> This is odd, you should't be getting this warning, I don't see it when
> running your benchmark on my cluster. Are there other relevant/warnings
> errors in the NN or DN logs?
>
> Thanks,
> Eli
> _______________________________________________
>
> This e-mail may contain information that is confidential, privileged or
> otherwise protected from disclosure. If you are not an intended recipient of
> this e-mail, do not duplicate or redistribute it by any means. Please delete
> it and any attachments and notify the sender that you have received it in
> error. Unless specifically indicated, this e-mail is not an offer to buy or
> sell or a solicitation to buy or sell any securities, investment products or
> other financial product or service, an official confirmation of any
> transaction, or an official statement of Barclays. Any views or opinions
> presented are solely those of the author and do not necessarily represent
> those of Barclays. This e-mail is subject to terms available at the
> following link: www.barcap.com/emaildisclaimer. By messaging with Barclays
> you consent to the foregoing.  Barclays Capital is the investment banking
> division of Barclays Bank PLC, a company registered in England (number
> 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
>  This email may relate to or be sent from other members of the Barclays
> Group.
> _______________________________________________
>

RE: Exponential performance decay - mystery solved

Posted by Zl...@barclayscapital.com.

My 2c: if it is not possible to move the i/o operations listFiles() and
length() outside the lock on FSVolumeSet, maybe set a flag that a block
report is in progress so that the rest of the datanode doesn't just
hang. 

Thanks,
Zlatin

________________________________

From: Dhruba Borthakur [mailto:dhruba@gmail.com] 
Sent: Thursday, January 21, 2010 3:38 PM
To: hdfs-user@hadoop.apache.org
Subject: Re: Exponential performance decay - mystery solved

Some of this delay in generating block reports might be mitigated via
http://issues.apache.org/jira/browse/HDFS-854 

thanks,
dhruba

On Thu, Jan 21, 2010 at 11:41 AM, <Zl...@barclayscapital.com>
wrote:

	Alright, the problem was caused by me setting the frequency of a
block
	report to 30 seconds.  The idea behind that was to create more
load on
	the Namenode, but I didn't notice that those block reports were
taking
	increasing amounts of time to generate.  During that time, a
lock was
	held which I'm guessing didn't allow the reporting datanode to
perform
	its functions.

	On my hardware, with 100,000 blocks the report takes over 7
seconds.  So
	every datanode was unavailable for 7 out of every 30 seconds.
Changing
	the interval to a more reasonable value restored the insertion
speed to
	linear.

	Apologies for creating this confusion, nevertheless it was a
useful
	thing to learn.

	Regards,
	Zlatin

	-----Original Message-----
	From: Eli Collins [mailto:eli@cloudera.com]
	Sent: Thursday, January 21, 2010 2:02 PM
	To: hdfs-user@hadoop.apache.org
	Subject: Re: Exponential performance decay - possible lead

	>
	> The messages are of the following:
	>
	> 2010-01-18 14:51:25,694 WARN
org.apache.hadoop.hdfs.StateChange:
	> BLOCK* NameSystem.addStoredBlock: Redundant addStoredBlock
request
	> received for blk_-5804440919363539694_1026 on
ip.removed:port.removed
	> size 1024

	This is odd, you should't be getting this warning, I don't see
it when
	running your benchmark on my cluster. Are there other
relevant/warnings
	errors in the NN or DN logs?

	Thanks,
	Eli
	_______________________________________________

	This e-mail may contain information that is confidential,
privileged or otherwise protected from disclosure. If you are not an
intended recipient of this e-mail, do not duplicate or redistribute it
by any means. Please delete it and any attachments and notify the sender
that you have received it in error. Unless specifically indicated, this
e-mail is not an offer to buy or sell or a solicitation to buy or sell
any securities, investment products or other financial product or
service, an official confirmation of any transaction, or an official
statement of Barclays. Any views or opinions presented are solely those
of the author and do not necessarily represent those of Barclays. This
e-mail is subject to terms available at the following link:
www.barcap.com/emaildisclaimer. By messaging with Barclays you consent
to the foregoing.  Barclays Capital is the investment banking division
of Barclays Bank PLC, a company registered in England (number 1026167)
with its registered office at 1 Churchill Place, London, E14 5HP.  This
email may relate to or be sent from other members of the Barclays Group.
	_______________________________________________

-- 
Connect to me at http://www.facebook.com/dhruba

_______________________________________________

This e-mail may contain information that is confidential, privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. Unless specifically indicated, this e-mail is not an offer to buy or sell or a solicitation to buy or sell any securities, investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Barclays. Any views or opinions presented are solely those of the author and do not necessarily represent those of Barclays. This e-mail is subject to terms available at the following link: www.barcap.com/emaildisclaimer. By messaging with Barclays you consent to the foregoing.  Barclays Capital is the investment banking division of Barclays Bank PLC, a company registered in England (number 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.  This email may relate to or be sent from other members of the Barclays Group.
_______________________________________________

Re: Exponential performance decay - mystery solved

Posted by Dhruba Borthakur <dh...@gmail.com>.

Some of this delay in generating block reports might be mitigated via
http://issues.apache.org/jira/browse/HDFS-854

thanks,
dhruba


On Thu, Jan 21, 2010 at 11:41 AM, <Zl...@barclayscapital.com>wrote:

>
>
> Alright, the problem was caused by me setting the frequency of a block
> report to 30 seconds.  The idea behind that was to create more load on
> the Namenode, but I didn't notice that those block reports were taking
> increasing amounts of time to generate.  During that time, a lock was
> held which I'm guessing didn't allow the reporting datanode to perform
> its functions.
>
> On my hardware, with 100,000 blocks the report takes over 7 seconds.  So
> every datanode was unavailable for 7 out of every 30 seconds.  Changing
> the interval to a more reasonable value restored the insertion speed to
> linear.
>
> Apologies for creating this confusion, nevertheless it was a useful
> thing to learn.
>
> Regards,
> Zlatin
>
> -----Original Message-----
> From: Eli Collins [mailto:eli@cloudera.com]
> Sent: Thursday, January 21, 2010 2:02 PM
> To: hdfs-user@hadoop.apache.org
> Subject: Re: Exponential performance decay - possible lead
>
> >
> > The messages are of the following:
> >
> > 2010-01-18 14:51:25,694 WARN org.apache.hadoop.hdfs.StateChange:
> > BLOCK* NameSystem.addStoredBlock: Redundant addStoredBlock request
> > received for blk_-5804440919363539694_1026 on ip.removed:port.removed
> > size 1024
>
> This is odd, you should't be getting this warning, I don't see it when
> running your benchmark on my cluster. Are there other relevant/warnings
> errors in the NN or DN logs?
>
> Thanks,
> Eli
> _______________________________________________
>
> This e-mail may contain information that is confidential, privileged or
> otherwise protected from disclosure. If you are not an intended recipient of
> this e-mail, do not duplicate or redistribute it by any means. Please delete
> it and any attachments and notify the sender that you have received it in
> error. Unless specifically indicated, this e-mail is not an offer to buy or
> sell or a solicitation to buy or sell any securities, investment products or
> other financial product or service, an official confirmation of any
> transaction, or an official statement of Barclays. Any views or opinions
> presented are solely those of the author and do not necessarily represent
> those of Barclays. This e-mail is subject to terms available at the
> following link: www.barcap.com/emaildisclaimer. By messaging with Barclays
> you consent to the foregoing.  Barclays Capital is the investment banking
> division of Barclays Bank PLC, a company registered in England (number
> 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
>  This email may relate to or be sent from other members of the Barclays
> Group.
> _______________________________________________
>



-- 
Connect to me at http://www.facebook.com/dhruba

RE: Exponential performance decay - mystery solved

Posted by Zl...@barclayscapital.com.


Alright, the problem was caused by me setting the frequency of a block
report to 30 seconds.  The idea behind that was to create more load on
the Namenode, but I didn't notice that those block reports were taking
increasing amounts of time to generate.  During that time, a lock was
held which I'm guessing didn't allow the reporting datanode to perform
its functions.

On my hardware, with 100,000 blocks the report takes over 7 seconds.  So
every datanode was unavailable for 7 out of every 30 seconds.  Changing
the interval to a more reasonable value restored the insertion speed to
linear.

Apologies for creating this confusion, nevertheless it was a useful
thing to learn.

Regards,
Zlatin

-----Original Message-----
From: Eli Collins [mailto:eli@cloudera.com] 
Sent: Thursday, January 21, 2010 2:02 PM
To: hdfs-user@hadoop.apache.org
Subject: Re: Exponential performance decay - possible lead

>
> The messages are of the following:
>
> 2010-01-18 14:51:25,694 WARN org.apache.hadoop.hdfs.StateChange: 
> BLOCK* NameSystem.addStoredBlock: Redundant addStoredBlock request 
> received for blk_-5804440919363539694_1026 on ip.removed:port.removed 
> size 1024

This is odd, you should't be getting this warning, I don't see it when
running your benchmark on my cluster. Are there other relevant/warnings
errors in the NN or DN logs?

Thanks,
Eli
_______________________________________________

This e-mail may contain information that is confidential, privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. Unless specifically indicated, this e-mail is not an offer to buy or sell or a solicitation to buy or sell any securities, investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Barclays. Any views or opinions presented are solely those of the author and do not necessarily represent those of Barclays. This e-mail is subject to terms available at the following link: www.barcap.com/emaildisclaimer. By messaging with Barclays you consent to the foregoing.  Barclays Capital is the investment banking division of Barclays Bank PLC, a company registered in England (number 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.  This email may relate to or be sent from other members of the Barclays Group.
_______________________________________________

Re: Exponential performance decay - possible lead

Posted by Eli Collins <el...@cloudera.com>.

>
> The messages are of the following:
>
> 2010-01-18 14:51:25,694 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: Redundant addStoredBlock request received for blk_-5804440919363539694_1026 on ip.removed:port.removed size 1024

This is odd, you should't be getting this warning, I don't see it when
running your benchmark on my cluster. Are there other
relevant/warnings errors in the NN or DN logs?

Thanks,
Eli

RE: Exponential performance decay - possible lead

Posted by Zl...@barclayscapital.com.

Hi everyone,

I think I have a lead on this issue.  I was watching the namenode metrics in the ganglia interface and noticed a discrepancy between the rate of log messages at INFO level and those at WARN level.  I'm attaching the two graphs.

The graphs for log messages at INFO level at the Namenode and the Datanodes have the same logarithmic shape just like the graphs for number of blocks inserted, block insertion ops, etc.  However, the graph for messages at WARN level at the Namenode has a linear shape.  So even though the insertion rate is slowing down, the rate of these warning messages is constant.  

The messages are of the following:

2010-01-18 14:51:25,694 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: Redundant addStoredBlock request received for blk_-5804440919363539694_1026 on ip.removed:port.removed size 1024

Let me know if I can help any further.
Zlatin Balevsky

-----Original Message-----
From: Balevsky, Zlatin: IT (NYK) 
Sent: Friday, January 15, 2010 8:26 AM
To: hdfs-user@hadoop.apache.org
Subject: RE: Exponential performance decay when inserting large number of blocks

In the first few tests I created total of 24 files.  In the test with the last graph 88 files before the break and 16 after.  The files were 

#!/bin/bash
HDFS_URI=$1
INSERT_NO=$2
for  i in $(seq 1 8)
do
 path/to/hadoop/bin/hadoop fs -put bigData hdfs://$HDFS_URI/bigData$INSERT_NO-$i > put$i.log 2>&1 & done 

And I invoke it manually, increasing $2 each run.  
-----Original Message-----
From: Eli Collins [mailto:eli@cloudera.com]
Sent: Thursday, January 14, 2010 9:13 PM
To: hdfs-user@hadoop.apache.org
Subject: Re: Exponential performance decay when inserting large number of blocks

On Thu, Jan 14, 2010 at 3:53 PM,  <Zl...@barclayscapital.com> wrote:
> And the results after several hour wait are the same.  The cluster was 
> absolutely idle between the insertions.  Attached is a graph.

How many files are you creating?  Can you post the script you're using to drive the test?

Thanks,
Eli
_______________________________________________

This e-mail may contain information that is confidential, privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. Unless specifically indicated, this e-mail is not an offer to buy or sell or a solicitation to buy or sell any securities, investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Barclays. Any views or opinions presented are solely those of the author and do not necessarily represent those of Barclays. This e-mail is subject to terms available at the following link: www.barcap.com/emaildisclaimer. By messaging with Barclays you consent to the foregoing.  Barclays Capital is the investment banking division of Barclays Bank PLC, a company registered in England (number 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.  This email may relate to or be sent from other members of the Barclays Group.
_______________________________________________

_______________________________________________

This e-mail may contain information that is confidential, privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. Unless specifically indicated, this e-mail is not an offer to buy or sell or a solicitation to buy or sell any securities, investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Barclays. Any views or opinions presented are solely those of the author and do not necessarily represent those of Barclays. This e-mail is subject to terms available at the following link: www.barcap.com/emaildisclaimer. By messaging with Barclays you consent to the foregoing.  Barclays Capital is the investment banking division of Barclays Bank PLC, a company registered in England (number 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.  This email may relate to or be sent from other members of the Barclays Group.
_______________________________________________