You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Espen Amble Kolstad (JIRA)" <ji...@apache.org> on 2007/07/13 13:42:06 UTC
[jira] Updated: (HADOOP-1609) Optimize
MapTask.MapOutputBuffer.spill() by not deserialize/serialize keys/values
but use appendRaw
[ https://issues.apache.org/jira/browse/HADOOP-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Espen Amble Kolstad updated HADOOP-1609:
----------------------------------------
Attachment: spill.patch
Patch for trunk
> Optimize MapTask.MapOutputBuffer.spill() by not deserialize/serialize keys/values but use appendRaw
> ---------------------------------------------------------------------------------------------------
>
> Key: HADOOP-1609
> URL: https://issues.apache.org/jira/browse/HADOOP-1609
> Project: Hadoop
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.14.0
> Reporter: Espen Amble Kolstad
> Attachments: spill.patch
>
>
> In MapTask.MapOutputBuffer.spill() every key and value is read from buffer and then written to file with append(key, value):
> {code}
> DataInputBuffer keyIn = new DataInputBuffer();
> DataInputBuffer valIn = new DataInputBuffer();
> DataOutputBuffer valOut = new DataOutputBuffer();
> while (resultIter.next()) {
> keyIn.reset(resultIter.getKey().getData(),
> resultIter.getKey().getLength());
> key.readFields(keyIn);
> valOut.reset();
> (resultIter.getValue()).writeUncompressedBytes(valOut);
> valIn.reset(valOut.getData(), valOut.getLength());
> value.readFields(valIn);
> writer.append(key, value);
> reporter.progress();
> }
> {code}
> When you have complex objects, like nutch's ParseData or Inlinks, this takes time and creates lots of garbage.
> I've created a patch, it seems to be working, only tested on 0.13.0.
> It's a bit clumsy, since ValueBytes is cast to Un-/CompressedBytes in SequenceFile.Writer.
> Thoughts?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.