You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by Mohit <mo...@huawei.com> on 2011/03/01 16:08:45 UTC

FW: Regarding HIVE-1737

Hi Namit/Siying,

 

Ok, even I agree with your analysis. Both the fixed and variable row size
evaluated wrongly here.

 

But what I was more interested in how critical is the change; like what if
hash aggregation map is not flushed, even if the number of existing entries
overshot the false entries stats calculated on basis of configured property
hive.map.aggr.map.percentmemory (whereas if it happens faithfully by the
code changes you did, it will trigger flush), any issues apart from out of
memory in child JVM or there is more to it, something else bad can happen?

 

If you can provide me the pointers to re-produce it's side effect, It will
be great.

 

-Mohit

 

****************************************************************************
***********
This e-mail and attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed
above. Any use of the information contained herein in any way (including,
but not limited to, total or partial disclosure, reproduction, or
dissemination) by persons other than the intended recipient's) is
prohibited. If you receive this e-mail in error, please notify the sender by
phone or email immediately and delete it!

  _____  

From: Mohit [mailto:mohitsikri@huawei.com] 
Sent: Tuesday, March 01, 2011 12:39 PM
To: 'siying.d@fb.com'
Subject: Regarding HIVE-1737

 

Hi Siying, 

 

Hope you doing great.

Well, I have one request regarding this defect, I'm not able to understand
and hence reproduce this issue.

May be you can help in that, I need to know what queries you ran.

 

-Mohit

 

****************************************************************************
***********
This e-mail and attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed
above. Any use of the information contained herein in any way (including,
but not limited to, total or partial disclosure, reproduction, or
dissemination) by persons other than the intended recipient's) is
prohibited. If you receive this e-mail in error, please notify the sender by
phone or email immediately and delete it!

RE: Regarding HIVE-1737

Posted by Siying Dong <si...@fb.com>.

Hi Mohit,



Can you be more precise how the fixed and variable row size are evaluated wrongly? I don't quite understand what you mean. Did I miss any context?



I guess you are running a previous version and try to figure out whether you need to port this patch? In that case, I think OOM is the worst possible case. We also care about whether one task uses more resource than it really needs and competes resource with other tasks. I don't think there can be other impact. If you want to try to repro a OOM, you should produce a condition that sum of distinct string key size > maximum heap size, and fix size + aggregate parameter size much smaller than average key size. You can try very long distinct string keys as input and group by it. My feeling is that it is not such a common case, since we never hit OOM for this.



For current trunk or version 0.7, now the codes are really not the same as when we did HIVE-1737, since we've had HIVE-1830 now, which put a memory usage check and force to flush the disk when memory is more than a threshold, so that even without HIVE-1737, there won't be OOM any way.



Thanks,



Siying



________________________________
From: Mohit [mohitsikri@huawei.com]
Sent: Tuesday, March 01, 2011 7:08 AM
To: Siying Dong
Cc: Namit Jain; chinnarao@huawei.com; hive-dev@hadoop.apache.org
Subject: FW: Regarding HIVE-1737

Hi Namit/Siying,

Ok, even I agree with your analysis. Both the fixed and variable row size evaluated wrongly here.

But what I was more interested in how critical is the change; like what if hash aggregation map is not flushed, even if the number of existing entries overshot the false entries stats calculated on basis of configured property hive.map.aggr.map.percentmemory (whereas if it happens faithfully by the code changes you did, it will trigger flush), any issues apart from out of memory in child JVM or there is more to it, something else bad can happen?

If you can provide me the pointers to re-produce it's side effect, It will be great.

-Mohit

***************************************************************************************
This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
________________________________
From: Mohit [mailto:mohitsikri@huawei.com]
Sent: Tuesday, March 01, 2011 12:39 PM
To: 'siying.d@fb.com'
Subject: Regarding HIVE-1737

Hi Siying,

Hope you doing great.
Well, I have one request regarding this defect, I'm not able to understand and hence reproduce this issue.
May be you can help in that, I need to know what queries you ran.

-Mohit

***************************************************************************************
This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!