You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ken Krugler <kk...@transpac.com> on 2011/09/02 02:27:25 UTC

Re: TestDFSIO failure

Hi Matt,

On Jun 20, 2011, at 1:46pm, GOEKE, MATTHEW (AG/1000) wrote:

> Has anyone else run into issues using output compression (in our case lzo) on TestDFSIO and it failing to be able to read the metrics file? I just assumed that it would use the correct decompression codec after it finishes but it always returns with a 'File not found' exception.

Yes, I've run into the same issue on 0.20.2 and CHD3u0

I don't see any Jira issue that covers this problem, so unless I hear otherwise I'll file one.

The problem is that the post-job code doesn't handle getting the <path>.deflate or <path>.lzo (for you) file from HDFS, and then decompressing it.

> Is there a simple way around this without spending the time to recompile a cluster/codec specific version?


You can use "hadoop fs -text <path reported in exception>.lzo"

This will dump out the file, which looks like:

f:rate  171455.11
f:sqrate        2981174.8
l:size  10485760000
l:tasks 10
l:time  590537

If you take f:rate/1000/l:tasks, that should give you the average MB/sec.

E.g. for the example above, that would be 171455/1000/10 = 17MB/sec.

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr