You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Shai Erera <se...@gmail.com> on 2010/11/25 11:47:27 UTC

Implement Writable which de-serializes to disk

Hi

I need to implement a Writable, which contains a lot of data, and
unfortunately I cannot break it down to smaller pieces. The output of a
Mapper is potentially a large record, which can be of any size ranging from
few 10s of MBs to few 100s of MBs.

Is there a way for me to de-serialize the Writable into a location on the
file system? Writable.readFields receives a DataInput only, which suggests I
should de-serialize it into RAM. If I could get a handle to the job/task's
output/temp directory, or just the temp directory, it'd be great - I could
de-serialize it there and read it in my Mapper/Reducer directly from the
file system.

I'm not sure I can use System.getProperty("java.io.tmpdir") - will that
work? Or is there a FileSystem API I should use instead?

Thanks,
Shai