You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Naama Kraus <na...@gmail.com> on 2008/06/26 13:43:40 UTC
InputSplit boundaries
Hi,
I have a question regarding InputSplit boundaries. Does an InputSplit
necessarily fall within a single file system block boundaries ? Or can it
span across blocks ? In particular, what about a FileSplit ?
If it spans among blocks, could the blocks reside in different machines ? If
so, how would it effect locality of computations ?
Thanks, Naama
--
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)
Re: InputSplit boundaries
Posted by Richard Zhang <ri...@gmail.com>.
The file system block size is the upper bound of the split size. The min
split size can be set up by users.
On Thu, Jun 26, 2008 at 12:23 PM, Naama Kraus <na...@gmail.com> wrote:
> Thanks for the input. Naama
>
> On Thu, Jun 26, 2008 at 2:12 PM, Amar Kamat <am...@yahoo-inc.com> wrote:
>
> > Naama Kraus wrote:
> >
> >> Hi,
> >>
> >> I have a question regarding InputSplit boundaries. Does an InputSplit
> >> necessarily fall within a single file system block boundaries ?
> >>
> > No.
> >
> >> Or can it
> >> span across blocks ?
> >>
> > Yes. It can span across blocks.
> >
> >> In particular, what about a FileSplit ?
> >> If it spans among blocks, could the blocks reside in different machines
> ?
> >>
> > Yes.
> >
> >> If
> >> so, how would it effect locality of computations ?
> >>
> >>
> > The remaining blocks get pulled to the machine that is executing the
> task.
> > Afaik the blocks are streamed while the map task is getting executed and
> > hence there is some amount of parallelism there. FYI there is one
> > optimization filed on this. See
> > https://issues.apache.org/jira/browse/HADOOP-3293.
> > Amar
> >
> >> Thanks, Naama
> >>
> >>
> >>
> >
> >
>
>
> --
> oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
> 00 oo 00 oo
> "If you want your children to be intelligent, read them fairy tales. If you
> want them to be more intelligent, read them more fairy tales." (Albert
> Einstein)
>
Re: InputSplit boundaries
Posted by Naama Kraus <na...@gmail.com>.
Thanks for the input. Naama
On Thu, Jun 26, 2008 at 2:12 PM, Amar Kamat <am...@yahoo-inc.com> wrote:
> Naama Kraus wrote:
>
>> Hi,
>>
>> I have a question regarding InputSplit boundaries. Does an InputSplit
>> necessarily fall within a single file system block boundaries ?
>>
> No.
>
>> Or can it
>> span across blocks ?
>>
> Yes. It can span across blocks.
>
>> In particular, what about a FileSplit ?
>> If it spans among blocks, could the blocks reside in different machines ?
>>
> Yes.
>
>> If
>> so, how would it effect locality of computations ?
>>
>>
> The remaining blocks get pulled to the machine that is executing the task.
> Afaik the blocks are streamed while the map task is getting executed and
> hence there is some amount of parallelism there. FYI there is one
> optimization filed on this. See
> https://issues.apache.org/jira/browse/HADOOP-3293.
> Amar
>
>> Thanks, Naama
>>
>>
>>
>
>
--
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)
Re: InputSplit boundaries
Posted by Amar Kamat <am...@yahoo-inc.com>.
Naama Kraus wrote:
> Hi,
>
> I have a question regarding InputSplit boundaries. Does an InputSplit
> necessarily fall within a single file system block boundaries ?
No.
> Or can it
> span across blocks ?
Yes. It can span across blocks.
> In particular, what about a FileSplit ?
> If it spans among blocks, could the blocks reside in different machines ?
Yes.
> If
> so, how would it effect locality of computations ?
>
The remaining blocks get pulled to the machine that is executing the
task. Afaik the blocks are streamed while the map task is getting
executed and hence there is some amount of parallelism there. FYI there
is one optimization filed on this. See
https://issues.apache.org/jira/browse/HADOOP-3293.
Amar
> Thanks, Naama
>
>
Re: iterative map-reduce
Posted by Paco NATHAN <ce...@gmail.com>.
A simple example of Hadoop application code which follows that pattern
(iterate until condition). In the "jyte" section:
http://code.google.com/p/ceteri-mapred/
Loop and condition test are in the same code which calls ToolRunner
and JobClient.
Best,
Paco
On Tue, Jul 29, 2008 at 10:03 AM, Christian Ulrik Søttrup
<so...@nbi.dk> wrote:
> Hi Shirley,
>
> I am basically doing as Qin suggested.
> I am running a job iteratively until some condition is met.
> My main looks something like:(in pseudo code)
>
> main:
> while (!converged):
> make new jobconf
> setup jobconf
> run jobconf
> check reporter for statistics
> decide if converged
>
> I use a custom reporter to check on the fitness of the solution in the
> reduce phase.
>
> If you need more(real java) code drop me a line.
>
> Cheers,
> Christian
Re: iterative map-reduce
Posted by Christian Ulrik Søttrup <so...@nbi.dk>.
Hi Shirley,
I am basically doing as Qin suggested.
I am running a job iteratively until some condition is met.
My main looks something like:(in pseudo code)
main:
while (!converged):
make new jobconf
setup jobconf
run jobconf
check reporter for statistics
decide if converged
I use a custom reporter to check on the fitness of the solution in the
reduce phase.
If you need more(real java) code drop me a line.
Cheers,
Christian
Qin Gao wrote:
> I think it is nothing to do with the framework, just treat the mapredcue as
> a batch process or a subroutine, and you may iteratively call them. If there
> are such interface, I am also interested to know.
>
>
>
> On Tue, Jul 29, 2008 at 10:31 AM, Shirley Cohen <sh...@cis.upenn.edu>wrote:
>
>
>> Thanks... would the iterative script be run outside of Hadoop? I was
>> actually trying to figure out if the framework could handle iterations.
>>
>> Shirley
>>
>>
>> On Jul 29, 2008, at 9:10 AM, Qin Gao wrote:
>>
>> if you are using java, just create job configure again and run it,
>>
>>> otherwise
>>> you just need to write a iterative script.
>>>
>>> On Tue, Jul 29, 2008 at 9:57 AM, Shirley Cohen <sc...@cs.utexas.edu>
>>> wrote:
>>>
>>> Hi,
>>>
>>>> I want to call a map-reduce program recursively until some condition is
>>>> met. How do I do that?
>>>>
>>>> Thanks,
>>>>
>>>> Shirley
>>>>
>>>>
>>>>
>>>>
>
>
Re: iterative map-reduce
Posted by Qin Gao <qi...@cs.cmu.edu>.
I think it is nothing to do with the framework, just treat the mapredcue as
a batch process or a subroutine, and you may iteratively call them. If there
are such interface, I am also interested to know.
On Tue, Jul 29, 2008 at 10:31 AM, Shirley Cohen <sh...@cis.upenn.edu>wrote:
> Thanks... would the iterative script be run outside of Hadoop? I was
> actually trying to figure out if the framework could handle iterations.
>
> Shirley
>
>
> On Jul 29, 2008, at 9:10 AM, Qin Gao wrote:
>
> if you are using java, just create job configure again and run it,
>> otherwise
>> you just need to write a iterative script.
>>
>> On Tue, Jul 29, 2008 at 9:57 AM, Shirley Cohen <sc...@cs.utexas.edu>
>> wrote:
>>
>> Hi,
>>>
>>> I want to call a map-reduce program recursively until some condition is
>>> met. How do I do that?
>>>
>>> Thanks,
>>>
>>> Shirley
>>>
>>>
>>>
>
Re: iterative map-reduce
Posted by Shirley Cohen <sh...@cis.upenn.edu>.
Thanks... would the iterative script be run outside of Hadoop? I was
actually trying to figure out if the framework could handle iterations.
Shirley
On Jul 29, 2008, at 9:10 AM, Qin Gao wrote:
> if you are using java, just create job configure again and run it,
> otherwise
> you just need to write a iterative script.
>
> On Tue, Jul 29, 2008 at 9:57 AM, Shirley Cohen
> <sc...@cs.utexas.edu> wrote:
>
>> Hi,
>>
>> I want to call a map-reduce program recursively until some
>> condition is
>> met. How do I do that?
>>
>> Thanks,
>>
>> Shirley
>>
>>
Re: iterative map-reduce
Posted by Qin Gao <qi...@cs.cmu.edu>.
if you are using java, just create job configure again and run it, otherwise
you just need to write a iterative script.
On Tue, Jul 29, 2008 at 9:57 AM, Shirley Cohen <sc...@cs.utexas.edu> wrote:
> Hi,
>
> I want to call a map-reduce program recursively until some condition is
> met. How do I do that?
>
> Thanks,
>
> Shirley
>
>
iterative map-reduce
Posted by Shirley Cohen <sc...@cs.utexas.edu>.
Hi,
I want to call a map-reduce program recursively until some condition
is met. How do I do that?
Thanks,
Shirley