You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by brisk <my...@gmail.com> on 2012/07/30 20:47:19 UTC

output/input ratio > 1 for map tasks?

Hi,

Does anybody know if there are some cases where the output/input ratio for
map tasks is larger than 1? I can just think of for the sort, it's 1 and
for the search job it's usually smaller than 1...

Thanks,
Ethan

Re: output/input ratio > 1 for map tasks?

Posted by Owen O'Malley <om...@apache.org>.
On Mon, Jul 30, 2012 at 11:47 AM, brisk <my...@gmail.com> wrote:

> Hi,
>
> Does anybody know if there are some cases where the output/input ratio for
> map tasks is larger than 1? I can just think of for the sort, it's 1 and
> for the search job it's usually smaller than 1...
>

The traditional case is building an inverted index of some sort. Your input
is the input documents, the shuffle is the set of search terms and their
targets and the output is the final index. The shuffle is much larger than
either the input or output.

-- Owen

Re: output/input ratio > 1 for map tasks?

Posted by brisk <my...@gmail.com>.
Thanks, Niels.

So do you mean in this case, I could expect the map output size (in terms
of bytes) could be larger than the input size (e.g. by default 64MB)? I
will also do a test later...

Best,
Ethan

On Mon, Jul 30, 2012 at 1:15 PM, Niels Basjes <Ni...@basjes.nl> wrote:

> Hi,
>
> On Mon, Jul 30, 2012 at 8:47 PM, brisk <my...@gmail.com> wrote:
> > Does anybody know if there are some cases where the output/input ratio
> for
> > map tasks is larger than 1? I can just think of for the sort, it's 1 and
> for
> > the search job it's usually smaller than 1...
>
> For a simple example: Have a look at the WordCount example.
>
> Input of a single map call is 1 record: "This is a line"
> Output are 4 records:
> This    1
> is       1
> a        1
> line     1
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>

Re: output/input ratio > 1 for map tasks?

Posted by Niels Basjes <Ni...@basjes.nl>.
Hi,

On Mon, Jul 30, 2012 at 8:47 PM, brisk <my...@gmail.com> wrote:
> Does anybody know if there are some cases where the output/input ratio for
> map tasks is larger than 1? I can just think of for the sort, it's 1 and for
> the search job it's usually smaller than 1...

For a simple example: Have a look at the WordCount example.

Input of a single map call is 1 record: "This is a line"
Output are 4 records:
This    1
is       1
a        1
line     1

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes