You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Shai Erera <se...@gmail.com> on 2011/08/03 10:22:32 UTC

Bug in FSDataSet?

Hi

I've been trying to embed MiniDFSCluster into my unit tests for a long time,
always giving up because it always failed, until yesterday I gave it another
try and accidentally ran the test with an Oracle JVM (my default is IBM's),
and it passed !

I run on Windows 7 64-bit, w/ hadoop-0.20.2.jar. Both Oracle and IBM JVM's
are 1.6 (updated to the latest).

I've done some investigation, and I found this:

The exception I get from the test is:*
*
INFO: PacketResponder blk_-2858095604616251978_1001 0 Exception
java.io.IOException: could not move files for blk_-2858095604616251978_1001
from tmp to
D:\dev\ilel\BigIndex\build\test\data\dfs\data\data1\current\blk_-2858095604616251978
    at
org.apache.hadoop.hdfs.server.datanode.FSDataset$FSDir.addBlock(FSDataset.java:104)
    at
org.apache.hadoop.hdfs.server.datanode.FSDataset$FSDir.addBlock(FSDataset.java:92)
    at
org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.addBlock(FSDataset.java:417)
    at
org.apache.hadoop.hdfs.server.datanode.FSDataset.finalizeBlock(FSDataset.java:1163)
    at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.lastDataNodeRun(BlockReceiver.java:804)
    at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:846)
    at java.lang.Thread.run(Thread.java:736)

It fails on the metaData.renameTo() call which returns false. While
debugging the test, I noticed that if I run w/ the IBM JVM, there is an open
handle left on metaData, while it doesn't exist if I run w/ Oracle's. So I
guess that's the reason why renameTo fails. On Linux w/ the same JVMs it
succeeds because you can move a file even if someone has an open handle on
it.

Digging a litter deeper, I found the cause of the problem, to be in
FSDataSet.createBlockWriterStreams, line 779:

return new BlockWriteStreams(new FileOutputStream(new RandomAccessFile( f ,
"rw" ).getFD()),
          new FileOutputStream( new RandomAccessFile( metafile , "rw"
).getFD() ));

BlockWriterStreams is given two FileOutputStreams, each are initialized with
a FileDescriptor returned from a RandomAccessFile. In the IBM JVM, the FD's
ref count is increased in FileOutputStream ctor, while in Oracle's it isn't.
Therefore, when you call close() on the output stream, it doesn't really
close the stream, because the FileDescriptor has another reference (from
RandomAccessFile).

>From FileOutputStream.<init>(FileDescriptor) javadocs:

"Creates an output file stream to write to the specified file
 descriptor, which represents an *existing* connection to an actual
 file in the file system."

I wrote this simple program which reproduces the bug:

public static void main(String[] args) throws Exception {
  File f = new File("check_close");

  RandomAccessFile raf = new RandomAccessFile(f, "rw");
  FileOutputStream out = null;
    try {
      out = new FileOutputStream(raf.getFD());
    } finally {
      raf.close();
    }
  out.write((byte) 1);
  out.close();

  System.out.println(f.delete());
}

if you inline the new RAF() call (therefore avoid its close()), the programs
prints 'false' (on Windows), and if you close it (like I did above), it
prints true. Running the above program with Oracle's JVM yields this:

Exception in thread "main" java.io.IOException: Write error
    at java.io.FileOutputStream.write(Native Method)

While I think the Oracle implementation is buggy, I'm not going to argue
about it here, because it's not stated clearly what can caller assume or not
assume. I think though that FSDataSet can avoid the bug if it simply creates
a new FOS like this: new FileOutputStream(file, true /* append */). Wouldn't
it achieve the same result? Is there a particular reason while we need to
open RAF() and pass its FileDescriptor, and throw away the RAF instance?

Thanks,
Shai