You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-dev@hadoop.apache.org by Jeff Zhang <zj...@gmail.com> on 2010/02/06 08:09:08 UTC

Why not making InputSplit implements interface Writable ?

Hi all,

I look at the source code of Hadoop, and found that the InputSplit did not
implements Writable. As my understanding, InputSplit will been transfered to
each TT and then deserialized. So it should implement the Writable
interface. And I check each implementation of InputSplit, actually all the
sub-classes implement the Writable interface. So I think it would be better
to to let the abstract class InputForamt implement the Writable, then users
won't forget to implement the method write(DataOutput out) and
readFields(DataInput in) if he wants to write a customized InputSplit.


-- 
Best Regards

Jeff Zhang

Re: Why not making InputSplit implements interface Writable ?

Posted by Jeff Zhang <zj...@gmail.com>.

Tom,

Thanks for your help.



On Sat, Feb 6, 2010 at 9:01 AM, Tom White <to...@cloudera.com> wrote:

> Hi Jeff,
>
> InputSplit in the new MapReduce API (in the o.a.h.mapreduce package)
> does not implement Writable since splits can be serialized using any
> serialization framework - e.g. Java object serialization. You can see
> where splits are serialized at JobSplitWriter.writeNewSplits() and
> deserialized on the task node at MapTask.getSplitDetails(). This is in
> contrast to the old API which mandated that InputSplits had to be
> Writable.
>
> Cheers,
> Tom
>
> On Fri, Feb 5, 2010 at 11:09 PM, Jeff Zhang <zj...@gmail.com> wrote:
> > Hi all,
> >
> > I look at the source code of Hadoop, and found that the InputSplit did
> not
> > implements Writable. As my understanding, InputSplit will been transfered
> to
> > each TT and then deserialized. So it should implement the Writable
> > interface. And I check each implementation of InputSplit, actually all
> the
> > sub-classes implement the Writable interface. So I think it would be
> better
> > to to let the abstract class InputForamt implement the Writable, then
> users
> > won't forget to implement the method write(DataOutput out) and
> > readFields(DataInput in) if he wants to write a customized InputSplit.
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>



-- 
Best Regards

Jeff Zhang

Re: Why not making InputSplit implements interface Writable ?

Posted by Tom White <to...@cloudera.com>.

Hi Jeff,

InputSplit in the new MapReduce API (in the o.a.h.mapreduce package)
does not implement Writable since splits can be serialized using any
serialization framework - e.g. Java object serialization. You can see
where splits are serialized at JobSplitWriter.writeNewSplits() and
deserialized on the task node at MapTask.getSplitDetails(). This is in
contrast to the old API which mandated that InputSplits had to be
Writable.

Cheers,
Tom

On Fri, Feb 5, 2010 at 11:09 PM, Jeff Zhang <zj...@gmail.com> wrote:
> Hi all,
>
> I look at the source code of Hadoop, and found that the InputSplit did not
> implements Writable. As my understanding, InputSplit will been transfered to
> each TT and then deserialized. So it should implement the Writable
> interface. And I check each implementation of InputSplit, actually all the
> sub-classes implement the Writable interface. So I think it would be better
> to to let the abstract class InputForamt implement the Writable, then users
> won't forget to implement the method write(DataOutput out) and
> readFields(DataInput in) if he wants to write a customized InputSplit.
>
>
> --
> Best Regards
>
> Jeff Zhang
>