You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Shrijeet Paliwal (Updated) (JIRA)" <ji...@apache.org> on 2012/03/13 23:44:40 UTC
[jira] [Updated] (HIVE-2869) Merging small files throws RuntimeException when hive.mergejob.maponly=false

     [ https://issues.apache.org/jira/browse/HIVE-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shrijeet Paliwal updated HIVE-2869:
-----------------------------------

    Attachment: data_to_reproduce.tar.gz
    
> Merging small files throws RuntimeException when hive.mergejob.maponly=false
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-2869
>                 URL: https://issues.apache.org/jira/browse/HIVE-2869
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.8.0
>         Environment: CentOS release 5.5 (Final)
>            Reporter: Shrijeet Paliwal
>         Attachments: data_to_reproduce.tar.gz
>
>
> Hive Version: Hive 0.8 (last commit SHA  b581a6192b8d4c544092679d05f45b2e50d42b45 ) 
> Hadoop version : chd3u0
> Trying to use the hive merge small file feature by setting all the necessary params.
> Have disabled use of CombineHiveInputFormat since my input is compressed text. 
> {noformat}
> hive> set mapred.min.split.size.per.node=1000000000;
> hive> set mapred.min.split.size.per.rack=1000000000;
> hive> set mapred.max.split.size=1000000000;
> hive> set hive.merge.size.per.task=1000000000;
> hive> set hive.merge.smallfiles.avgsize=1000000000;
> hive> set hive.merge.size.smallfiles.avgsize=1000000000;
> hive> set hive.merge.mapfiles=true;
> hive> set hive.merge.mapredfiles=true;
> hive> set hive.mergejob.maponly=false;
> {noformat}
> The plan decides to launch two MR jobs but after first job succeeds I get runt time error 
> "java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but reduce operator specified"
> *How to reproduce :* 
> * Creare tables as follows : 
> {code}
> --create input table
> create table tmp_notmerged (
>   id                int,
>   name              string
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> STORED AS TEXTFILE;
> --create o/p table
> create table tmp_merged (
>   id                int
> )
> STORED AS TEXTFILE;
> {code}
> * Load data into tmp_notmerged (find files attached in with this jira)
> * set knobs and fire hive query 
> {code}
> set hive.merge.mapfiles=true;
> set hive.mergejob.maponly=false;
> insert overwrite table tmp_merged select id from tmp_notmerged;
> {code}
> * You should see error "java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but reduce operator specified"
> *Proposed fix :*
> Patch is here : https://gist.github.com/2025303

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira