You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Ashutosh Chauhan (JIRA)" <ji...@apache.org> on 2017/06/09 18:14:18 UTC

[jira] [Commented] (HIVE-16870) Give Hive the ability to suppress output of empty files

    [ https://issues.apache.org/jira/browse/HIVE-16870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044789#comment-16044789 ] 

Ashutosh Chauhan commented on HIVE-16870:
-----------------------------------------

Dupe of HIVE-13040 ?

> Give Hive the ability to suppress output of empty files
> -------------------------------------------------------
>
>                 Key: HIVE-16870
>                 URL: https://issues.apache.org/jira/browse/HIVE-16870
>             Project: Hive
>          Issue Type: Improvement
>          Components: StorageHandler
>            Reporter: Stephen Measmer
>
> Today some hive queries using joins can output zero byte files, particularly on large joins.  This can have a negative affect on HDFS as it can lead to too many small files [1].
> A solution suggested in this Cloudera Community thread [2] suggests using OutputFormat of LazyOutputFormat because MapReduce can be set to suppress the generation of empty (zero byte) files.
> But it's not possible to create a table with an OutputFormat of just LazyOutputFormat in Hive.  Below is what we found when testing. 
> create table mytable (fip int, state string, zip string, level int) STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat';
> ------------
> Error: Error while compiling statement: FAILED: SemanticException [Error 10055]: Output Format must implement HiveOutputFormat, otherwise it should be either IgnoreKeyTextOutputFormat or SequenceFileOutputFormat (state=42000,code=10055)
> [1] http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
> [2] https://community.cloudera.com/t5/Batch-Processing-and-Workflow/how-to-suppress-mapper-output-files-if-the-output-file-does-not/td-p/29540



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)