You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2006/09/26 17:49:53 UTC
[jira] Commented: (HADOOP-451) Add a Split interface
[ http://issues.apache.org/jira/browse/HADOOP-451?page=comments#action_12437871 ]
Owen O'Malley commented on HADOOP-451:
--------------------------------------
There is a problem with this proposal in that the MapTask uses the FileSplit.getLength() as a count of the number of bytes in the split to compute progress. I see three reasonable alternatives:
1. Rename getCost() to getLength() and assume it is a number of bytes.
2. Add a getLength() method to Split that returns the number of bytes.
3. Add a new method to RecordReader "double getProgress();" that returns the progress within the record reader.
Option 1 has the lowest impact on existing application code, but I think option 3 is a better choice.
Thoughts?
> Add a Split interface
> ---------------------
>
> Key: HADOOP-451
> URL: http://issues.apache.org/jira/browse/HADOOP-451
> Project: Hadoop
> Issue Type: Improvement
> Components: mapred
> Reporter: Doug Cutting
>
> The InputFormat interface has a method:
> FileSplit[] getSplits();
> This should change to:
> Split[] getSplits();
> The Split interface would look like:
> public interface Split extends Writable {
> /** Returns a list of hosts that contain this split.
> This is only used to optimize task placement, so this may be empty. */
> String[] getLocations(FileSystem fs);
> /** The relative, estimated cost of operating on this. Typically the size of the data in the split.
> Used to prioritize tasks in a job (high-cost tasks are run first). */
> long getCost();
> }
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira