You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ning Zhang (JIRA)" <ji...@apache.org> on 2010/08/19 20:57:16 UTC

[jira] Commented: (HIVE-1567) increase hive.mapjoin.maxsize to 10 million

    [ https://issues.apache.org/jira/browse/HIVE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900397#action_12900397 ] 

Ning Zhang commented on HIVE-1567:
----------------------------------

The hive.mapjoin.maxsize is there not for speed, it is for limiting memory consumption. We saw OOM exceptions quite a lot before this parameter was introduced. Rather than increasing it blindly a better way may be to estimate how many rows can be fit into memory based on the row size and available memory and adjusting this parameter automatically.

> increase hive.mapjoin.maxsize to 10 million
> -------------------------------------------
>
>                 Key: HIVE-1567
>                 URL: https://issues.apache.org/jira/browse/HIVE-1567
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> i saw in a very wide table, hive can process 1million rows in less than one minute (select all columns).
> setting the hive.mapjoin.maxsize to 100k is kind of too restrictive. Let's increase this to 10 million.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.