You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Naama Kraus <na...@gmail.com> on 2008/06/26 13:43:40 UTC

InputSplit boundaries

Hi,

I have a question regarding InputSplit boundaries. Does an InputSplit
necessarily fall within a single file system block boundaries ? Or can it
span across blocks ? In particular, what about a FileSplit ?
If it spans among blocks, could the blocks reside in different machines ? If
so, how would it effect locality of computations ?

Thanks, Naama

-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

Re: InputSplit boundaries

Posted by Richard Zhang <ri...@gmail.com>.

The file system block size is the upper bound of the split size. The min
split size can be set up by users.

On Thu, Jun 26, 2008 at 12:23 PM, Naama Kraus <na...@gmail.com> wrote:

> Thanks for the input. Naama
>
> On Thu, Jun 26, 2008 at 2:12 PM, Amar Kamat <am...@yahoo-inc.com> wrote:
>
> > Naama Kraus wrote:
> >
> >> Hi,
> >>
> >> I have a question regarding InputSplit boundaries. Does an InputSplit
> >> necessarily fall within a single file system block boundaries ?
> >>
> > No.
> >
> >> Or can it
> >> span across blocks ?
> >>
> > Yes. It can span across blocks.
> >
> >> In particular, what about a FileSplit ?
> >> If it spans among blocks, could the blocks reside in different machines
> ?
> >>
> > Yes.
> >
> >> If
> >> so, how would it effect locality of computations ?
> >>
> >>
> > The remaining blocks get pulled to the machine that is executing the
> task.
> > Afaik the blocks are streamed while the map task is getting executed and
> > hence there is some amount of parallelism there. FYI there is one
> > optimization filed on this. See
> > https://issues.apache.org/jira/browse/HADOOP-3293.
> > Amar
> >
> >> Thanks, Naama
> >>
> >>
> >>
> >
> >
>
>
> --
> oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
> 00 oo 00 oo
> "If you want your children to be intelligent, read them fairy tales. If you
> want them to be more intelligent, read them more fairy tales." (Albert
> Einstein)
>

Re: InputSplit boundaries

Posted by Naama Kraus <na...@gmail.com>.

Thanks for the input. Naama

On Thu, Jun 26, 2008 at 2:12 PM, Amar Kamat <am...@yahoo-inc.com> wrote:

> Naama Kraus wrote:
>
>> Hi,
>>
>> I have a question regarding InputSplit boundaries. Does an InputSplit
>> necessarily fall within a single file system block boundaries ?
>>
> No.
>
>> Or can it
>> span across blocks ?
>>
> Yes. It can span across blocks.
>
>> In particular, what about a FileSplit ?
>> If it spans among blocks, could the blocks reside in different machines ?
>>
> Yes.
>
>> If
>> so, how would it effect locality of computations ?
>>
>>
> The remaining blocks get pulled to the machine that is executing the task.
> Afaik the blocks are streamed while the map task is getting executed and
> hence there is some amount of parallelism there. FYI there is one
> optimization filed on this. See
> https://issues.apache.org/jira/browse/HADOOP-3293.
> Amar
>
>> Thanks, Naama
>>
>>
>>
>
>


-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

Re: InputSplit boundaries

Posted by Amar Kamat <am...@yahoo-inc.com>.

Naama Kraus wrote:
> Hi,
>
> I have a question regarding InputSplit boundaries. Does an InputSplit
> necessarily fall within a single file system block boundaries ? 
No.
> Or can it
> span across blocks ? 
Yes. It can span across blocks.
> In particular, what about a FileSplit ?
> If it spans among blocks, could the blocks reside in different machines ? 
Yes.
> If
> so, how would it effect locality of computations ?
>   
The remaining blocks get pulled to the machine that is executing the 
task. Afaik the blocks are streamed while the map task is getting 
executed and hence there is some amount of parallelism there. FYI there 
is one optimization filed on this. See 
https://issues.apache.org/jira/browse/HADOOP-3293.
Amar
> Thanks, Naama
>
>

Re: iterative map-reduce

Posted by Paco NATHAN <ce...@gmail.com>.

A simple example of Hadoop application code which follows that pattern
(iterate until condition). In the "jyte" section:

   http://code.google.com/p/ceteri-mapred/

Loop and condition test are in the same code which calls ToolRunner
and JobClient.

Best,
Paco


On Tue, Jul 29, 2008 at 10:03 AM, Christian Ulrik Søttrup
<so...@nbi.dk> wrote:
> Hi Shirley,
>
> I am basically doing as Qin suggested.
> I am running a job iteratively until some condition is met.
> My main looks something like:(in pseudo code)
>
> main:
> while (!converged):
>  make new jobconf
>  setup jobconf
>  run jobconf
>  check reporter for statistics
>  decide if converged
>
> I use a custom reporter to check on the fitness of the solution in the
> reduce phase.
>
> If you need more(real java) code drop me a line.
>
> Cheers,
> Christian

Re: iterative map-reduce

Posted by Christian Ulrik Søttrup <so...@nbi.dk>.

Hi Shirley,

I am basically doing as Qin suggested.
I am running a job iteratively until some condition is met.
My main looks something like:(in pseudo code)

main:
 while (!converged):
   make new jobconf
   setup jobconf
   run jobconf
   check reporter for statistics
   decide if converged

I use a custom reporter to check on the fitness of the solution in the 
reduce phase.

If you need more(real java) code drop me a line.

Cheers,
Christian

Qin Gao wrote:
> I think it is nothing to do with the framework, just treat the mapredcue as
> a batch process or a subroutine, and you may iteratively call them. If there
> are such interface, I am also interested to know.
>
>
>
> On Tue, Jul 29, 2008 at 10:31 AM, Shirley Cohen <sh...@cis.upenn.edu>wrote:
>
>   
>> Thanks... would the iterative script be run outside of Hadoop? I was
>> actually trying to figure out if the framework could handle iterations.
>>
>> Shirley
>>
>>
>> On Jul 29, 2008, at 9:10 AM, Qin Gao wrote:
>>
>>  if you are using java, just create job configure again and run it,
>>     
>>> otherwise
>>> you just need to write a iterative script.
>>>
>>> On Tue, Jul 29, 2008 at 9:57 AM, Shirley Cohen <sc...@cs.utexas.edu>
>>> wrote:
>>>
>>>  Hi,
>>>       
>>>> I want to call a map-reduce program recursively until some condition is
>>>> met.  How do I do that?
>>>>
>>>> Thanks,
>>>>
>>>> Shirley
>>>>
>>>>
>>>>
>>>>         
>
>

Re: iterative map-reduce

Posted by Qin Gao <qi...@cs.cmu.edu>.

I think it is nothing to do with the framework, just treat the mapredcue as
a batch process or a subroutine, and you may iteratively call them. If there
are such interface, I am also interested to know.

On Tue, Jul 29, 2008 at 10:31 AM, Shirley Cohen <sh...@cis.upenn.edu>wrote:

> Thanks... would the iterative script be run outside of Hadoop? I was
> actually trying to figure out if the framework could handle iterations.
>
> Shirley
>
>
> On Jul 29, 2008, at 9:10 AM, Qin Gao wrote:
>
>  if you are using java, just create job configure again and run it,
>> otherwise
>> you just need to write a iterative script.
>>
>> On Tue, Jul 29, 2008 at 9:57 AM, Shirley Cohen <sc...@cs.utexas.edu>
>> wrote:
>>
>>  Hi,
>>>
>>> I want to call a map-reduce program recursively until some condition is
>>> met.  How do I do that?
>>>
>>> Thanks,
>>>
>>> Shirley
>>>
>>>
>>>
>

Re: iterative map-reduce

Posted by Shirley Cohen <sh...@cis.upenn.edu>.

Thanks... would the iterative script be run outside of Hadoop? I was  
actually trying to figure out if the framework could handle iterations.

Shirley

On Jul 29, 2008, at 9:10 AM, Qin Gao wrote:

> if you are using java, just create job configure again and run it,  
> otherwise
> you just need to write a iterative script.
>
> On Tue, Jul 29, 2008 at 9:57 AM, Shirley Cohen  
> <sc...@cs.utexas.edu> wrote:
>
>> Hi,
>>
>> I want to call a map-reduce program recursively until some  
>> condition is
>> met.  How do I do that?
>>
>> Thanks,
>>
>> Shirley
>>
>>

Re: iterative map-reduce

Posted by Qin Gao <qi...@cs.cmu.edu>.

if you are using java, just create job configure again and run it, otherwise
you just need to write a iterative script.

On Tue, Jul 29, 2008 at 9:57 AM, Shirley Cohen <sc...@cs.utexas.edu> wrote:

> Hi,
>
> I want to call a map-reduce program recursively until some condition is
> met.  How do I do that?
>
> Thanks,
>
> Shirley
>
>

iterative map-reduce

Posted by Shirley Cohen <sc...@cs.utexas.edu>.

Hi,

I want to call a map-reduce program recursively until some condition  
is met.  How do I do that?

Thanks,

Shirley