You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Yongzhi Chen (JIRA)" <ji...@apache.org> on 2016/12/05 23:03:58 UTC

[jira] [Commented] (HIVE-15359) skip.footer.line.count doesnt work properly for certain situations

    [ https://issues.apache.org/jira/browse/HIVE-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723657#comment-15723657 ] 

Yongzhi Chen commented on HIVE-15359:
-------------------------------------

Current skip footer feature need one file map to one split to work properly. The split need to be not only logical one but also physically one. Which means, the related file is unsplitable. Reproduce the issue with a data file has size of 140M. In hadoop, it is put into two blocks: the lengths are: 128M, 12M . 128M is dfs.block.size. For this query, hive use CombineHiveInputSplit to handle split, although logically, There is only one CombineHiveInputSplit(so one mapper), but the split has two paths (the same path with different startpos and lengths: 128M, 12M).
When CombineHiveRecordReader use the split, CombineHiveRecordReader generate two FileSplits for the two blocks. And the code in HiveContextAwareRecordReader that handle skip footer assuming each FileSplit is physically independent file, it skip footer in the first block and does not do any thing in the second block. So some record in the middle of the file is wrongly skipped as the footer, the real footer is still in the result. 
Fix the issue by tranfer footerbuffer across FileSplits for the same file, that will make the one mapper case correctly for skipping footer.

> skip.footer.line.count doesnt work properly for certain situations
> ------------------------------------------------------------------
>
>                 Key: HIVE-15359
>                 URL: https://issues.apache.org/jira/browse/HIVE-15359
>             Project: Hive
>          Issue Type: Bug
>          Components: Reader
>            Reporter: Yongzhi Chen
>            Assignee: Yongzhi Chen
>
> This issue's reproduce is very like HIVE-12718 , but the data file is larger than 128M . In this case, even make sure only one mapper is used, the footer is still wrongly skipped. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)