You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Qinghui Xu (JIRA)" <ji...@apache.org> on 2018/08/13 09:23:00 UTC

[jira] [Commented] (HIVE-20254) CheckNonCombinablePathCallable is buggy

    [ https://issues.apache.org/jira/browse/HIVE-20254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578003#comment-16578003 ] 

Qinghui Xu commented on HIVE-20254:
-----------------------------------

I think we should backport this fix to version 1.1.0

> CheckNonCombinablePathCallable is buggy
> ---------------------------------------
>
>                 Key: HIVE-20254
>                 URL: https://issues.apache.org/jira/browse/HIVE-20254
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>            Reporter: Qinghui Xu
>            Priority: Major
>
> CombineHiveInputFormat provides the possibility for people to avoid combine some part of their inputs (by implementing AvoidSplitCombination)
> We spot a problem with that when our query tries to read a lot of partitions (more than 100). In fact, when there are more than 100 input paths, the check of combinability is run in parallel:
>  * dividing the input path array into several chunks (each chunk with no more than 100 paths)
>  * submit each chunk to a CheckNonCombinablePathCallable
>  * each CheckNonCombinablePathCallable will return a set of index for the paths to not be combined
> The problem is that CheckNonCombinablePathCallable returns a set of relative index (the index inside the chunk) instead of the absolute index, it means that the returned indices are always smaller than 100, thus all the paths in the array with position bigger than 100 are never taken into account for avoiding combine input.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)