You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Carl Steinbach (Commented) (JIRA)" <ji...@apache.org> on 2012/03/20 23:35:45 UTC

[jira] [Commented] (HIVE-2877) TABLESAMPLE(x PERCENT) tests fail on 0.22/0.23

    [ https://issues.apache.org/jira/browse/HIVE-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233888#comment-13233888 ] 

Carl Steinbach commented on HIVE-2877:
--------------------------------------

There are two distinct problems:

1) Many of the queries in split_sample.q and sample_islocalmode_hook.q are nondeterministic. This can be fixed by adding ORDER BY clauses.

2) The second problem is more serious. Both of the tests set mapred.max.split.size=300 and hive.merge.smallfiles.avgsize=1 in an effort to force the generation of multiple splits and multiple output files. However, Hadoop 0.20 is incapable of generating splits smaller than the block size when using CombineFileInputFormat, so only one split is generated. This has a significant impact on the results of the TABLESAMPLE(x PERCENT). This issue was fixed in MAPREDUCE-2046 which is included in 0.23.

Suggested Fixes: 
# Make the queries deterministic
# Restrict these tests to Hadoop versions >= 0.22

                
> TABLESAMPLE(x PERCENT) tests fail on 0.22/0.23
> ----------------------------------------------
>
>                 Key: HIVE-2877
>                 URL: https://issues.apache.org/jira/browse/HIVE-2877
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Carl Steinbach
>            Assignee: Carl Steinbach
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira