You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Rares Vernica <rv...@gmail.com> on 2009/06/14 00:40:54 UTC

get number of values for a key

Hello,

In Reduce, can I get the number of values for the current key without
iterating over them? Does Hadoop has this number?

Or, at least the total number of pairs that will be processed by the
current Reduce instance. I am pretty sure that Hadoop already knows
this number because it sorted them.

BTW, the iterators given to Reduce are one-time use iterators, right?

Thanks!
Rares

Re: get number of values for a key

Posted by Rares Vernica <rv...@gmail.com>.
On 6/14/09, Jothi Padmanabhan <jo...@yahoo-inc.com> wrote:
>
> No, there is no way to get the number of values for a current key.

Thanks! What about the number of pairs that will be processed by the
current Reduce instance?

Re: get number of values for a key

Posted by Jothi Padmanabhan <jo...@yahoo-inc.com>.
No, there is no way to get the number of values for a current key.


==> BTW, the iterators given to Reduce are one-time use iterators, right?

Hadoop-5266 introduced mark/reset support for values iterator. You may want
to take a look at that.

Cheers
Jothi

On 6/15/09 7:07 AM, "jason hadoop" <ja...@gmail.com> wrote:

> It would be nice if there was an interface compliant way. Perhaps it becomes
> available in the 0.20 and beyond api's.
> 
> On Sat, Jun 13, 2009 at 3:40 PM, Rares Vernica <rv...@gmail.com> wrote:
> 
>> Hello,
>> 
>> In Reduce, can I get the number of values for the current key without
>> iterating over them? Does Hadoop has this number?
>> 
>> Or, at least the total number of pairs that will be processed by the
>> current Reduce instance. I am pretty sure that Hadoop already knows
>> this number because it sorted them.
>> 
>> BTW, the iterators given to Reduce are one-time use iterators, right?
>> 
>> Thanks!
>> Rares
>> 
> 
> 


Re: get number of values for a key

Posted by jason hadoop <ja...@gmail.com>.
It would be nice if there was an interface compliant way. Perhaps it becomes
available in the 0.20 and beyond api's.

On Sat, Jun 13, 2009 at 3:40 PM, Rares Vernica <rv...@gmail.com> wrote:

> Hello,
>
> In Reduce, can I get the number of values for the current key without
> iterating over them? Does Hadoop has this number?
>
> Or, at least the total number of pairs that will be processed by the
> current Reduce instance. I am pretty sure that Hadoop already knows
> this number because it sorted them.
>
> BTW, the iterators given to Reduce are one-time use iterators, right?
>
> Thanks!
> Rares
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.apress.com/book/view/9781430219422
www.prohadoopbook.com a community for Hadoop Professionals