You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Hive QA (JIRA)" <ji...@apache.org> on 2014/02/13 01:55:19 UTC

[jira] [Commented] (HIVE-6326) Split generation in ORC may generate wrong split boundaries because of unaccounted padded bytes

    [ https://issues.apache.org/jira/browse/HIVE-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899860#comment-13899860 ] 

Hive QA commented on HIVE-6326:
-------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12628363/HIVE-6326.2.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5086 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16
{noformat}

Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1296/testReport
Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1296/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12628363

> Split generation in ORC may generate wrong split boundaries because of unaccounted padded bytes
> -----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-6326
>                 URL: https://issues.apache.org/jira/browse/HIVE-6326
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.13.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>              Labels: orcfile
>         Attachments: HIVE-6326.1.patch, HIVE-6326.2.patch
>
>
> HIVE-5091 added padding to ORC files to avoid ORC stripes straddling HDFS blocks. The length of this padded bytes are not stored in stripe information. OrcInputFormat.getSplits() uses stripeInformation.getLength() for split computation. stripeInformation.getLength() is sum of index length, data length and stripe footer length. It does not account for the length of padded bytes which may result in wrong split boundary.
> The fix for this is to use the offset of next stripe as the length of current stripe which includes the padded bytes as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)