You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Anthony Urso <an...@gmail.com> on 2010/11/07 14:38:31 UTC

Predicting how many values will I see in a call to reduce?

Is there any way to know how many values I will see in a call to
reduce without first counting through them all with the iterator?

Under 0.21? 0.20? 0.19?

Thanks,
Anthony

Re: Predicting how many values will I see in a call to reduce?

Posted by Lance Norskog <go...@gmail.com>.
It is key to the scheduling paradigm of Hadoop that it doesn't have to
tell you how many or when. It would have to store up all of the data
for your key before activating your reducer. This is exactly what it
cannot do and scale.

(right?)

On Mon, Nov 8, 2010 at 3:32 AM, Niels Basjes <Ni...@basjes.nl> wrote:
> Hi,
>
> 2010/11/7 Anthony Urso <an...@gmail.com>
>>
>> Is there any way to know how many values I will see in a call to
>> reduce without first counting through them all with the iterator?
>>
>> Under 0.21? 0.20? 0.19?
>
> I've looked for an answer to the same question a while ago and came to the
> conclusion that you can't.
> The main limit is that the Iterator does not have a "size" or "length"
> method.
>
> --
> Met vriendelijke groeten,
>
> Niels Basjes
>



-- 
Lance Norskog
goksron@gmail.com

Re: Predicting how many values will I see in a call to reduce?

Posted by Niels Basjes <Ni...@basjes.nl>.
Hi,

2010/11/7 Anthony Urso <an...@gmail.com>

> Is there any way to know how many values I will see in a call to
> reduce without first counting through them all with the iterator?
>
> Under 0.21? 0.20? 0.19?
>

I've looked for an answer to the same question a while ago and came to the
conclusion that you can't.
The main limit is that the Iterator does not have a "size" or "length"
method.

-- 
Met vriendelijke groeten,

Niels Basjes

Re: Predicting how many values will I see in a call to reduce?

Posted by Owen O'Malley <om...@apache.org>.
On Sun, Nov 7, 2010 at 5:38 AM, Anthony Urso <an...@gmail.com> wrote:

> Is there any way to know how many values I will see in a call to
> reduce without first counting through them all with the iterator?


No, there currently isn't. The framework doesn't have the information until
the iterator is exhausted. The iterator is not in memory, but is being
synthesized as the result of a N-way merge sort from disk and memory. If
your application needs that knowledge, you could do it from the application.
If your value sets are small enough to fit in memory, the easiest thing to
do is just read them into a list from the iterator (cloning the values to
avoid the object reuse!).

You could try using the resettable iterators, but I don't know how reliable
they are.

-- Owen