You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by maha <ma...@umail.ucsb.edu> on 2011/02/08 00:38:12 UTC
Quick Question: LineSplit or BlockSplit
Hi,
I would appreciate it if you could give me your thoughts if there is affect on efficiency if:
1) Mappers were per line in a document
or
2) Mappers were per block of lines in a document.
I know the obvious difference I can see is that (1) has more mappers. Does that mean (1) will be slower because of scheduling time ?
Thank you,
Maha
Re: Quick Question: LineSplit or BlockSplit
Posted by maha <ma...@umail.ucsb.edu>.
Thanks Ted. Then I have to write my own InputFormat to read a block-of-lines per mapper.
NLineInputFormat didn't work with me, any working example about it is appreciate it.
Thanks again,
Maha
On Feb 7, 2011, at 6:32 PM, Mark Kerzner wrote:
> Thanks!
> Mark
>
> On Mon, Feb 7, 2011 at 8:28 PM, Ted Dunning <td...@maprtech.com> wrote:
>
>> That is quite doable. One way to do it is to make the max split size quite
>> small.
>>
>> On Mon, Feb 7, 2011 at 6:14 PM, Mark Kerzner <ma...@gmail.com>
>> wrote:
>>
>>> Ted,
>>>
>>> I am also interested in this answer.
>>>
>>> I put the name of a zip file on a line in an input file, and I want one
>>> mapper to read this line, and start working on it (since it now knows the
>>> path in HDFS). Are you saying it's not doable?
>>>
>>> Thank you,
>>> Mark
>>>
>>> On Mon, Feb 7, 2011 at 8:10 PM, Ted Dunning <td...@maprtech.com>
>> wrote:
>>>
>>>> Option (1) isn't the way that things normally work. Besides, mappers
>> are
>>>> called many times for each construction of a mapper.
>>>>
>>>> On Mon, Feb 7, 2011 at 3:38 PM, maha <ma...@umail.ucsb.edu> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I would appreciate it if you could give me your thoughts if there is
>>>>> affect on efficiency if:
>>>>>
>>>>> 1) Mappers were per line in a document
>>>>>
>>>>> or
>>>>>
>>>>> 2) Mappers were per block of lines in a document.
>>>>>
>>>>>
>>>>> I know the obvious difference I can see is that (1) has more
>> mappers.
>>>> Does
>>>>> that mean (1) will be slower because of scheduling time ?
>>>>>
>>>>> Thank you,
>>>>> Maha
>>>>>
>>>>
>>>
>>
Re: Quick Question: LineSplit or BlockSplit
Posted by Mark Kerzner <ma...@gmail.com>.
Thanks!
Mark
On Mon, Feb 7, 2011 at 8:28 PM, Ted Dunning <td...@maprtech.com> wrote:
> That is quite doable. One way to do it is to make the max split size quite
> small.
>
> On Mon, Feb 7, 2011 at 6:14 PM, Mark Kerzner <ma...@gmail.com>
> wrote:
>
> > Ted,
> >
> > I am also interested in this answer.
> >
> > I put the name of a zip file on a line in an input file, and I want one
> > mapper to read this line, and start working on it (since it now knows the
> > path in HDFS). Are you saying it's not doable?
> >
> > Thank you,
> > Mark
> >
> > On Mon, Feb 7, 2011 at 8:10 PM, Ted Dunning <td...@maprtech.com>
> wrote:
> >
> > > Option (1) isn't the way that things normally work. Besides, mappers
> are
> > > called many times for each construction of a mapper.
> > >
> > > On Mon, Feb 7, 2011 at 3:38 PM, maha <ma...@umail.ucsb.edu> wrote:
> > >
> > > > Hi,
> > > >
> > > > I would appreciate it if you could give me your thoughts if there is
> > > > affect on efficiency if:
> > > >
> > > > 1) Mappers were per line in a document
> > > >
> > > > or
> > > >
> > > > 2) Mappers were per block of lines in a document.
> > > >
> > > >
> > > > I know the obvious difference I can see is that (1) has more
> mappers.
> > > Does
> > > > that mean (1) will be slower because of scheduling time ?
> > > >
> > > > Thank you,
> > > > Maha
> > > >
> > >
> >
>
Re: Quick Question: LineSplit or BlockSplit
Posted by Ted Dunning <td...@maprtech.com>.
That is quite doable. One way to do it is to make the max split size quite
small.
On Mon, Feb 7, 2011 at 6:14 PM, Mark Kerzner <ma...@gmail.com> wrote:
> Ted,
>
> I am also interested in this answer.
>
> I put the name of a zip file on a line in an input file, and I want one
> mapper to read this line, and start working on it (since it now knows the
> path in HDFS). Are you saying it's not doable?
>
> Thank you,
> Mark
>
> On Mon, Feb 7, 2011 at 8:10 PM, Ted Dunning <td...@maprtech.com> wrote:
>
> > Option (1) isn't the way that things normally work. Besides, mappers are
> > called many times for each construction of a mapper.
> >
> > On Mon, Feb 7, 2011 at 3:38 PM, maha <ma...@umail.ucsb.edu> wrote:
> >
> > > Hi,
> > >
> > > I would appreciate it if you could give me your thoughts if there is
> > > affect on efficiency if:
> > >
> > > 1) Mappers were per line in a document
> > >
> > > or
> > >
> > > 2) Mappers were per block of lines in a document.
> > >
> > >
> > > I know the obvious difference I can see is that (1) has more mappers.
> > Does
> > > that mean (1) will be slower because of scheduling time ?
> > >
> > > Thank you,
> > > Maha
> > >
> >
>
Re: Quick Question: LineSplit or BlockSplit
Posted by Mark Kerzner <ma...@gmail.com>.
Ted,
I am also interested in this answer.
I put the name of a zip file on a line in an input file, and I want one
mapper to read this line, and start working on it (since it now knows the
path in HDFS). Are you saying it's not doable?
Thank you,
Mark
On Mon, Feb 7, 2011 at 8:10 PM, Ted Dunning <td...@maprtech.com> wrote:
> Option (1) isn't the way that things normally work. Besides, mappers are
> called many times for each construction of a mapper.
>
> On Mon, Feb 7, 2011 at 3:38 PM, maha <ma...@umail.ucsb.edu> wrote:
>
> > Hi,
> >
> > I would appreciate it if you could give me your thoughts if there is
> > affect on efficiency if:
> >
> > 1) Mappers were per line in a document
> >
> > or
> >
> > 2) Mappers were per block of lines in a document.
> >
> >
> > I know the obvious difference I can see is that (1) has more mappers.
> Does
> > that mean (1) will be slower because of scheduling time ?
> >
> > Thank you,
> > Maha
> >
>
Re: Quick Question: LineSplit or BlockSplit
Posted by Ted Dunning <td...@maprtech.com>.
Option (1) isn't the way that things normally work. Besides, mappers are
called many times for each construction of a mapper.
On Mon, Feb 7, 2011 at 3:38 PM, maha <ma...@umail.ucsb.edu> wrote:
> Hi,
>
> I would appreciate it if you could give me your thoughts if there is
> affect on efficiency if:
>
> 1) Mappers were per line in a document
>
> or
>
> 2) Mappers were per block of lines in a document.
>
>
> I know the obvious difference I can see is that (1) has more mappers. Does
> that mean (1) will be slower because of scheduling time ?
>
> Thank you,
> Maha
>