You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Utkarsh Gupta <Ut...@infosys.com> on 2013/01/10 09:41:40 UTC

Limitation of key-value pairs for a particular key.

Hi,

I am using Apache Hadoop 1.0.4 on a 10 node cluster of commodity machines with Ubuntu 12.04 Server edition. I am having a issue with my map reduce code. While debugging I found that the reducer can take 262145 values for a particular key. If more values are there, they seem to be corrupted. I checked the values while emitting from map and again checked in reducer.
I am wondering is there any such kind of limitation in the Hadoop or is it a configuration problem.


Thanks and Regards
Utkarsh Gupta



**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are not 
to copy, disclose, or distribute this e-mail or its contents to any other person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has taken 
every reasonable precaution to minimize this risk, but is not liable for any damage 
you may sustain as a result of any virus in this e-mail. You should carry out your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***

RE: Limitation of key-value pairs for a particular key.

Posted by Sven Groot <sg...@gmail.com>.

Hi,

 

I think I know what's going on here. It has to do with how many spills the
map task performs.

 

You are emitting the numbers in order, so if there is only one spill, they
stay in order. For larger number of records, the map task will create more
than one spill, which must be merged. During the merge, the original order
is not preserved.

 

If you want the original order to be preserved, you must set io.sort.mb
and/or io.sort.record.percent such that the map task requires only a single
spill.

 

Cheers,

Sven 

 

From: Utkarsh Gupta [mailto:Utkarsh_Gupta@infosys.com] 
Sent: 18 January 2013 18:25
To: mapreduce-user@hadoop.apache.org
Subject: RE: Limitation of key-value pairs for a particular key.

 

You are right 

Actually we were expecting the values to be sorted.

We tried to reproduce the problem by this simple code

private final IntWritable one=new IntWritable(1);

        private Text word=new Text();

        @Override

        public void map(LongWritable key,Text value, Context context) throws
IOException, InterruptedException {

            int N=30000;

            for(int i=0;i<N;i++)

            {

                word.set(i+"");

                System.out.println(i);

                context.write(one,word);

            }

        }    

For smaller N numbers were in order but for N 3000000 order was not
maintained

 

From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Thursday, January 17, 2013 1:57 AM
To: mapreduce-user
Subject: RE: Limitation of key-value pairs for a particular key.

 

We don't sort values (only keys) nor apply any manual limits in MR. Can your
post a reproduceable test case to support your suspicion?

On Jan 16, 2013 4:34 PM, "Utkarsh Gupta" <Utkarsh_Gupta@infosys.com
<ma...@infosys.com> > wrote:

Hi,

Thanks for the response. There was some issues with my code. I have checked
that in detail. 

All the values of map are present in reducer but not in sorted order. This
case happens if the number of values are too large for a key. 

 

Thanks

Utkarsh

 

From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com
<ma...@hortonworks.com> ] 
Sent: Thursday, January 10, 2013 11:00 PM
To: mapreduce-user@hadoop.apache.org
<ma...@hadoop.apache.org> 
Subject: Re: Limitation of key-value pairs for a particular key.

 

There isn't any limit like that. Can you reproduce this consistently? If so,
please file a ticket.

It will definitely help if you can provide a test case which can reproduce
this issue.

Thanks,
+Vinod

 

On Thu, Jan 10, 2013 at 12:41 AM, Utkarsh Gupta <Utkarsh_Gupta@infosys.com
<ma...@infosys.com> > wrote:

Hi,

 

I am using Apache Hadoop 1.0.4 on a 10 node cluster of commodity machines
with Ubuntu 12.04 Server edition. I am having a issue with my map reduce
code. While debugging I found that the reducer can take 262145 values for a
particular key. If more values are there, they seem to be corrupted. I
checked the values while emitting from map and again checked in reducer.

I am wondering is there any such kind of limitation in the Hadoop or is it a
configuration problem.

 

 

Thanks and Regards

Utkarsh Gupta

 

 


**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely

for the use of the addressee(s). If you are not the intended recipient,
please 
notify the sender by e-mail and delete the original message. Further, you
are not 
to copy, disclose, or distribute this e-mail or its contents to any other
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has
taken 
every reasonable precaution to minimize this risk, but is not liable for any
damage 
you may sustain as a result of any virus in this e-mail. You should carry
out your 
own virus checks before opening the e-mail or attachment. Infosys reserves
the 
right to monitor and review the content of all messages sent to or from this
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***




-- 
+Vinod
Hortonworks Inc.
http://hortonworks.com/

RE: Limitation of key-value pairs for a particular key.

Posted by Utkarsh Gupta <Ut...@infosys.com>.

You are right
Actually we were expecting the values to be sorted.
We tried to reproduce the problem by this simple code
private final IntWritable one=new IntWritable(1);
        private Text word=new Text();
        @Override
        public void map(LongWritable key,Text value, Context context) throws IOException, InterruptedException {
            int N=30000;
            for(int i=0;i<N;i++)
            {
                word.set(i+"");
                System.out.println(i);
                context.write(one,word);
            }
        }
For smaller N numbers were in order but for N 3000000 order was not maintained

From: Harsh J [mailto:harsh@cloudera.com]
Sent: Thursday, January 17, 2013 1:57 AM
To: mapreduce-user
Subject: RE: Limitation of key-value pairs for a particular key.


We don't sort values (only keys) nor apply any manual limits in MR. Can your post a reproduceable test case to support your suspicion?
On Jan 16, 2013 4:34 PM, "Utkarsh Gupta" <Ut...@infosys.com>> wrote:
Hi,
Thanks for the response. There was some issues with my code. I have checked that in detail.
All the values of map are present in reducer but not in sorted order. This case happens if the number of values are too large for a key.

Thanks
Utkarsh

From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com<ma...@hortonworks.com>]
Sent: Thursday, January 10, 2013 11:00 PM
To: mapreduce-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Limitation of key-value pairs for a particular key.

There isn't any limit like that. Can you reproduce this consistently? If so, please file a ticket.

It will definitely help if you can provide a test case which can reproduce this issue.

Thanks,
+Vinod

On Thu, Jan 10, 2013 at 12:41 AM, Utkarsh Gupta <Ut...@infosys.com>> wrote:
Hi,

I am using Apache Hadoop 1.0.4 on a 10 node cluster of commodity machines with Ubuntu 12.04 Server edition. I am having a issue with my map reduce code. While debugging I found that the reducer can take 262145 values for a particular key. If more values are there, they seem to be corrupted. I checked the values while emitting from map and again checked in reducer.
I am wondering is there any such kind of limitation in the Hadoop or is it a configuration problem.


Thanks and Regards
Utkarsh Gupta



**************** CAUTION - Disclaimer *****************

This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely

for the use of the addressee(s). If you are not the intended recipient, please

notify the sender by e-mail and delete the original message. Further, you are not

to copy, disclose, or distribute this e-mail or its contents to any other person and

any such actions are unlawful. This e-mail may contain viruses. Infosys has taken

every reasonable precaution to minimize this risk, but is not liable for any damage

you may sustain as a result of any virus in this e-mail. You should carry out your

own virus checks before opening the e-mail or attachment. Infosys reserves the

right to monitor and review the content of all messages sent to or from this e-mail

address. Messages sent to or from this e-mail address may be stored on the

Infosys e-mail system.

***INFOSYS******** End of Disclaimer ********INFOSYS***




--
+Vinod
Hortonworks Inc.
http://hortonworks.com/

RE: Limitation of key-value pairs for a particular key.

Posted by Harsh J <ha...@cloudera.com>.

We don't sort values (only keys) nor apply any manual limits in MR. Can
your post a reproduceable test case to support your suspicion?
On Jan 16, 2013 4:34 PM, "Utkarsh Gupta" <Ut...@infosys.com> wrote:

>  Hi,****
>
> Thanks for the response. There was some issues with my code. I have
> checked that in detail. ****
>
> All the values of map are present in reducer but not in sorted order. This
> case happens if the number of values are too large for a key. ****
>
> ** **
>
> Thanks****
>
> Utkarsh****
>
> ** **
>
> *From:* Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
> *Sent:* Thursday, January 10, 2013 11:00 PM
> *To:* mapreduce-user@hadoop.apache.org
> *Subject:* Re: Limitation of key-value pairs for a particular key.****
>
> ** **
>
> There isn't any limit like that. Can you reproduce this consistently? If
> so, please file a ticket.
>
> It will definitely help if you can provide a test case which can reproduce
> this issue.
>
> Thanks,
> +Vinod****
>
> ** **
>
> On Thu, Jan 10, 2013 at 12:41 AM, Utkarsh Gupta <Ut...@infosys.com>
> wrote:****
>
> Hi,****
>
>  ****
>
> I am using Apache Hadoop 1.0.4 on a 10 node cluster of commodity machines
> with Ubuntu 12.04 Server edition. I am having a issue with my map reduce
> code. While debugging I found that the reducer can take 262145 values for a
> particular key. If more values are there, they seem to be corrupted. I
> checked the values while emitting from map and again checked in reducer.**
> **
>
> I am wondering is there any such kind of limitation in the Hadoop or is it
> a configuration problem.****
>
>  ****
>
>  ****
>
> Thanks and Regards****
>
> Utkarsh Gupta****
>
>  ****
>
>  ****
>
> **************** CAUTION - Disclaimer *********************
>
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely ****
>
> for the use of the addressee(s). If you are not the intended recipient, please ****
>
> notify the sender by e-mail and delete the original message. Further, you are not ****
>
> to copy, disclose, or distribute this e-mail or its contents to any other person and ****
>
> any such actions are unlawful. This e-mail may contain viruses. Infosys has taken ****
>
> every reasonable precaution to minimize this risk, but is not liable for any damage ****
>
> you may sustain as a result of any virus in this e-mail. You should carry out your ****
>
> own virus checks before opening the e-mail or attachment. Infosys reserves the ****
>
> right to monitor and review the content of all messages sent to or from this e-mail ****
>
> address. Messages sent to or from this e-mail address may be stored on the ****
>
> Infosys e-mail system.****
>
> ***INFOSYS******** End of Disclaimer ********INFOSYS*******
>
>
>
>
> --
> +Vinod
> Hortonworks Inc.
> http://hortonworks.com/ ** **
>

RE: Limitation of key-value pairs for a particular key.

Posted by Utkarsh Gupta <Ut...@infosys.com>.

Hi,
Thanks for the response. There was some issues with my code. I have checked that in detail.
All the values of map are present in reducer but not in sorted order. This case happens if the number of values are too large for a key.

Thanks
Utkarsh

From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
Sent: Thursday, January 10, 2013 11:00 PM
To: mapreduce-user@hadoop.apache.org
Subject: Re: Limitation of key-value pairs for a particular key.

There isn't any limit like that. Can you reproduce this consistently? If so, please file a ticket.

It will definitely help if you can provide a test case which can reproduce this issue.

Thanks,
+Vinod

On Thu, Jan 10, 2013 at 12:41 AM, Utkarsh Gupta <Ut...@infosys.com>> wrote:
Hi,

I am using Apache Hadoop 1.0.4 on a 10 node cluster of commodity machines with Ubuntu 12.04 Server edition. I am having a issue with my map reduce code. While debugging I found that the reducer can take 262145 values for a particular key. If more values are there, they seem to be corrupted. I checked the values while emitting from map and again checked in reducer.
I am wondering is there any such kind of limitation in the Hadoop or is it a configuration problem.

Thanks and Regards
Utkarsh Gupta

**************** CAUTION - Disclaimer *****************

This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely

for the use of the addressee(s). If you are not the intended recipient, please

notify the sender by e-mail and delete the original message. Further, you are not

to copy, disclose, or distribute this e-mail or its contents to any other person and

any such actions are unlawful. This e-mail may contain viruses. Infosys has taken

every reasonable precaution to minimize this risk, but is not liable for any damage

you may sustain as a result of any virus in this e-mail. You should carry out your

own virus checks before opening the e-mail or attachment. Infosys reserves the

right to monitor and review the content of all messages sent to or from this e-mail

address. Messages sent to or from this e-mail address may be stored on the

Infosys e-mail system.

***INFOSYS******** End of Disclaimer ********INFOSYS***

--
+Vinod
Hortonworks Inc.
http://hortonworks.com/

Re: Limitation of key-value pairs for a particular key.

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

There isn't any limit like that. Can you reproduce this consistently? If
so, please file a ticket.

It will definitely help if you can provide a test case which can reproduce
this issue.

Thanks,
+Vinod


On Thu, Jan 10, 2013 at 12:41 AM, Utkarsh Gupta
<Ut...@infosys.com>wrote:

>  Hi,****
>
> ** **
>
> I am using Apache Hadoop 1.0.4 on a 10 node cluster of commodity machines
> with Ubuntu 12.04 Server edition. I am having a issue with my map reduce
> code. While debugging I found that the reducer can take 262145 values for a
> particular key. If more values are there, they seem to be corrupted. I
> checked the values while emitting from map and again checked in reducer.**
> **
>
> I am wondering is there any such kind of limitation in the Hadoop or is it
> a configuration problem.****
>
> ** **
>
> ** **
>
> Thanks and Regards****
>
> Utkarsh Gupta****
>
> ** **
>
> ** **
>
> **************** CAUTION - Disclaimer *****************
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
> for the use of the addressee(s). If you are not the intended recipient, please
> notify the sender by e-mail and delete the original message. Further, you are not
> to copy, disclose, or distribute this e-mail or its contents to any other person and
> any such actions are unlawful. This e-mail may contain viruses. Infosys has taken
> every reasonable precaution to minimize this risk, but is not liable for any damage
> you may sustain as a result of any virus in this e-mail. You should carry out your
> own virus checks before opening the e-mail or attachment. Infosys reserves the
> right to monitor and review the content of all messages sent to or from this e-mail
> address. Messages sent to or from this e-mail address may be stored on the
> Infosys e-mail system.
> ***INFOSYS******** End of Disclaimer ********INFOSYS***
>
>


-- 
+Vinod
Hortonworks Inc.
http://hortonworks.com/