You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Sai Sai <sa...@yahoo.in> on 2013/09/27 07:12:53 UTC

Re: Input Split vs Task vs attempt vs computation

Hi
I have a few questions i am trying to understand:

1. Is each input split same as a record, (a rec can be a single line or multiple lines).

2. Is each Task a collection of few computations or attempts.

For ex: if i have a small file with 5 lines.

By default there will be 1 line on which each map computation is performed.
So totally 5 computations r done on 1 node.

This means JT will spawn 1 JVM for 1 Tasktracker on a node
and another JVM for map task which will instantiate 5 map objects 1 for each line.

The MT JVM is called the task which will have 5 attempts for  each line.
This means attempt is same as computation.

Please let me know if anything is incorrect.
Thanks
Sai

Re: Input Split vs Task vs attempt vs computation

Posted by Sonal Goyal <so...@gmail.com>.
Inline

Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Fri, Sep 27, 2013 at 10:42 AM, Sai Sai <sa...@yahoo.in> wrote:

> Hi
> I have a few questions i am trying to understand:
>
> 1. Is each input split same as a record, (a rec can be a single line or
> multiple lines).
>

An InputSplit is a chunk of input that is handled by a map task. It will
generally contain multiple records. The RecordReader provides the key
values to the map task. Check
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/InputSplit.html

>
> 2. Is each Task a collection of few computations or attempts.
>
> For ex: if i have a small file with 5 lines.
> By default there will be 1 line on which each map computation is performed.
> So totally 5 computations r done on 1 node.
>
> This means JT will spawn 1 JVM for 1 Tasktracker on a node
> and another JVM for map task which will instantiate 5 map objects 1 for
> each line.
>
> i am not sure what you mean by 5 map objects. But yes, the mapper will be
invoked 5 times, once for each line.


> The MT JVM is called the task which will have 5 attempts for  each line.
> This means attempt is same as computation.
>
> Please let me know if anything is incorrect.
> Thanks
> Sai
>
>

Re: Input Split vs Task vs attempt vs computation

Posted by Sonal Goyal <so...@gmail.com>.
Inline

Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Fri, Sep 27, 2013 at 10:42 AM, Sai Sai <sa...@yahoo.in> wrote:

> Hi
> I have a few questions i am trying to understand:
>
> 1. Is each input split same as a record, (a rec can be a single line or
> multiple lines).
>

An InputSplit is a chunk of input that is handled by a map task. It will
generally contain multiple records. The RecordReader provides the key
values to the map task. Check
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/InputSplit.html

>
> 2. Is each Task a collection of few computations or attempts.
>
> For ex: if i have a small file with 5 lines.
> By default there will be 1 line on which each map computation is performed.
> So totally 5 computations r done on 1 node.
>
> This means JT will spawn 1 JVM for 1 Tasktracker on a node
> and another JVM for map task which will instantiate 5 map objects 1 for
> each line.
>
> i am not sure what you mean by 5 map objects. But yes, the mapper will be
invoked 5 times, once for each line.


> The MT JVM is called the task which will have 5 attempts for  each line.
> This means attempt is same as computation.
>
> Please let me know if anything is incorrect.
> Thanks
> Sai
>
>

Re: Input Split vs Task vs attempt vs computation

Posted by Sonal Goyal <so...@gmail.com>.
Inline

Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Fri, Sep 27, 2013 at 10:42 AM, Sai Sai <sa...@yahoo.in> wrote:

> Hi
> I have a few questions i am trying to understand:
>
> 1. Is each input split same as a record, (a rec can be a single line or
> multiple lines).
>

An InputSplit is a chunk of input that is handled by a map task. It will
generally contain multiple records. The RecordReader provides the key
values to the map task. Check
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/InputSplit.html

>
> 2. Is each Task a collection of few computations or attempts.
>
> For ex: if i have a small file with 5 lines.
> By default there will be 1 line on which each map computation is performed.
> So totally 5 computations r done on 1 node.
>
> This means JT will spawn 1 JVM for 1 Tasktracker on a node
> and another JVM for map task which will instantiate 5 map objects 1 for
> each line.
>
> i am not sure what you mean by 5 map objects. But yes, the mapper will be
invoked 5 times, once for each line.


> The MT JVM is called the task which will have 5 attempts for  each line.
> This means attempt is same as computation.
>
> Please let me know if anything is incorrect.
> Thanks
> Sai
>
>

Re: Input Split vs Task vs attempt vs computation

Posted by Sonal Goyal <so...@gmail.com>.
Inline

Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Fri, Sep 27, 2013 at 10:42 AM, Sai Sai <sa...@yahoo.in> wrote:

> Hi
> I have a few questions i am trying to understand:
>
> 1. Is each input split same as a record, (a rec can be a single line or
> multiple lines).
>

An InputSplit is a chunk of input that is handled by a map task. It will
generally contain multiple records. The RecordReader provides the key
values to the map task. Check
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/InputSplit.html

>
> 2. Is each Task a collection of few computations or attempts.
>
> For ex: if i have a small file with 5 lines.
> By default there will be 1 line on which each map computation is performed.
> So totally 5 computations r done on 1 node.
>
> This means JT will spawn 1 JVM for 1 Tasktracker on a node
> and another JVM for map task which will instantiate 5 map objects 1 for
> each line.
>
> i am not sure what you mean by 5 map objects. But yes, the mapper will be
invoked 5 times, once for each line.


> The MT JVM is called the task which will have 5 attempts for  each line.
> This means attempt is same as computation.
>
> Please let me know if anything is incorrect.
> Thanks
> Sai
>
>