You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Mayank Bansal <Ma...@mu-sigma.com> on 2012/10/01 10:50:31 UTC

Percentile calculation

Hi,

I am trying to run the hive udf percentile, I am trying to run it on a column with something around 116 million unique values.
The maximum space that I can give to the reducer is 12 GB, the job keeps on failing due to java heap space error.
Is there a way to optimize this, so that I don't encounter this error?
Or any other suggestion or solution which could help me out?

Thanks,
Mayank

________________________________
This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.

RE: Percentile calculation

Posted by Mayank Bansal <Ma...@mu-sigma.com>.
Sorry for missing out on a very important piece of information.

I am running this query

Select explode(percentile(i_xxx,array(0.01,0.05,0.5,0.75,0.95,0.99) ))from table;

This did not run then I tried
Select percentile(i_xxx,array(0.05)) from table;

This also did not run.
The error I get is as follows

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":{"counts":{483733351:1,483733349:1,483733348:1,483733347:1,483733346:1,483733345:1,483733344:1,483733359:1,483733357:1,483733356:1,483733355:1,483733353:1,483733352:1,483733366:1,483733367:1,483733364:1,483733362:1,483733363:1,483733361:1,483733374:1,483733375:1,483733372:1,483733373:1,483733370:1,483733371:1,483733368:1,483733317:1,483733316:1,483733318:1,483733315:1,483733314:1,483733324:1,483733327:1,483733326:1,483733321:1,483733320:1,483733323:1,483733322:1,483733332:1,483733333:1,483733334:1,483733335:1,483733328:1,483733329:1,483733330:1,483733331:1,483733340:1,483733342:1,483733343:1,483733336:1,483733337:1,483733339:1,483733283:1,483733282:1,483733280:1,483733287:1,483733286:1,483733285:1,483733291:1,483733290:1,483733288:1,483733294:1,483733293:1,483733292:1,483733299:1,483733296:1,483733297:1,483733302:1,483733300:1,483733306:1,483733307:1,48373330

I did some search and it was for java heap space overflow error.
Before this error I was also getting GC overhead limit reached, I increased the reducer memory to 12  GB and then I got the above error.
Can you please help in solving this problem.

Thanks,
Mayank

-----Original Message-----
From: Mayank Bansal [mailto:Mayank.Bansal@mu-sigma.com]
Sent: Tuesday, October 02, 2012 6:23 PM
To: user@hive.apache.org
Subject: RE: Percentile calculation

I have a 11 node hadoop cluster, the  map phase runs, the process fails at the reduce phase after 67% competition with the out of java heap space error.
Could you please tell me, what further info do you want?

-----Original Message-----
From: MiaoMiao [mailto:liy099@gmail.com]
Sent: Tuesday, October 02, 2012 8:41 AM
To: user@hive.apache.org
Subject: Re: Percentile calculation

More info, please.

On Mon, Oct 1, 2012 at 4:50 PM, Mayank Bansal <Ma...@mu-sigma.com> wrote:
> Hi,
>
>
>
> I am trying to run the hive udf percentile, I am trying to run it on a
> column with something around 116 million unique values.
>
> The maximum space that I can give to the reducer is 12 GB, the job
> keeps on failing due to java heap space error.
>
> Is there a way to optimize this, so that I don’t encounter this error?
>
> Or any other suggestion or solution which could help me out?
>
>
>
> Thanks,
>
> Mayank
>
>
> ________________________________
> This email message may contain proprietary, private and confidential
> information. The information transmitted is intended only for the
> person(s) or entities to which it is addressed. Any review,
> retransmission, dissemination or other use of, or taking of any action
> in reliance upon, this information by persons or entities other than
> the intended recipient is prohibited and may be illegal. If you
> received this in error, please contact the sender and delete the message from your system.
>
> Mu Sigma takes all reasonable steps to ensure that its electronic
> communications are free from viruses. However, given Internet
> accessibility, the Company cannot accept liability for any virus
> introduced by this e-mail or any attachment and you are advised to use
> up-to-date virus checking software.

 This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.

 This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.

RE: Percentile calculation

Posted by Mayank Bansal <Ma...@mu-sigma.com>.
I have a 11 node hadoop cluster, the  map phase runs, the process fails at the reduce phase after 67% competition with the out of java heap space error.
Could you please tell me, what further info do you want?

-----Original Message-----
From: MiaoMiao [mailto:liy099@gmail.com]
Sent: Tuesday, October 02, 2012 8:41 AM
To: user@hive.apache.org
Subject: Re: Percentile calculation

More info, please.

On Mon, Oct 1, 2012 at 4:50 PM, Mayank Bansal <Ma...@mu-sigma.com> wrote:
> Hi,
>
>
>
> I am trying to run the hive udf percentile, I am trying to run it on a
> column with something around 116 million unique values.
>
> The maximum space that I can give to the reducer is 12 GB, the job
> keeps on failing due to java heap space error.
>
> Is there a way to optimize this, so that I don’t encounter this error?
>
> Or any other suggestion or solution which could help me out?
>
>
>
> Thanks,
>
> Mayank
>
>
> ________________________________
> This email message may contain proprietary, private and confidential
> information. The information transmitted is intended only for the
> person(s) or entities to which it is addressed. Any review,
> retransmission, dissemination or other use of, or taking of any action
> in reliance upon, this information by persons or entities other than
> the intended recipient is prohibited and may be illegal. If you
> received this in error, please contact the sender and delete the message from your system.
>
> Mu Sigma takes all reasonable steps to ensure that its electronic
> communications are free from viruses. However, given Internet
> accessibility, the Company cannot accept liability for any virus
> introduced by this e-mail or any attachment and you are advised to use
> up-to-date virus checking software.

 This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.

Re: Percentile calculation

Posted by MiaoMiao <li...@gmail.com>.
More info, please.

On Mon, Oct 1, 2012 at 4:50 PM, Mayank Bansal
<Ma...@mu-sigma.com> wrote:
> Hi,
>
>
>
> I am trying to run the hive udf percentile, I am trying to run it on a
> column with something around 116 million unique values.
>
> The maximum space that I can give to the reducer is 12 GB, the job keeps on
> failing due to java heap space error.
>
> Is there a way to optimize this, so that I don’t encounter this error?
>
> Or any other suggestion or solution which could help me out?
>
>
>
> Thanks,
>
> Mayank
>
>
> ________________________________
> This email message may contain proprietary, private and confidential
> information. The information transmitted is intended only for the person(s)
> or entities to which it is addressed. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon,
> this information by persons or entities other than the intended recipient is
> prohibited and may be illegal. If you received this in error, please contact
> the sender and delete the message from your system.
>
> Mu Sigma takes all reasonable steps to ensure that its electronic
> communications are free from viruses. However, given Internet accessibility,
> the Company cannot accept liability for any virus introduced by this e-mail
> or any attachment and you are advised to use up-to-date virus checking
> software.