You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Biswajit Nayak <bi...@altiscale.com> on 2016/04/19 02:44:24 UTC

Re: Hive Cli ORC table read error with limit option

Hi All,

I seriously need help on this aspect. Any reference or pointer to
troubleshoot or fix this, could be helpful.

Regards
Biswa

On Fri, Mar 25, 2016 at 11:24 PM, Biswajit Nayak <bi...@altiscale.com>
wrote:

> Prashanth,
>
> Apologies for the delay in response.
>
> Below is the orcfiledump of the empty orc file from a broken partition.
>
> *$ hive --orcfiledump /hive/*testdb*.db/*table_orc
> */year=2016/month=1/day=29/000000_0*
>
> *Structure for  /hive/*testdb*.db/*table_orc
> */year=2016/month=1/day=29/000000_0*
>
> *File Version: 0.12 with HIVE_8732*
>
> *16/03/25 17:49:09 INFO orc.ReaderImpl: Reading ORC rows from  /hive/*
> testdb*.db/*table_orc*/year=2016/month=1/day=29/000000_0 with {include:
> null, offset: 0, length: 9223372036854775807}*
>
> *16/03/25 17:49:09 INFO orc.RecordReaderFactory: Schema is not specified
> on read. Using file schema.*
>
> *Rows: 0*
>
> *Compression: SNAPPY*
>
> *Compression size: 262144*
>
> *Type: struct<>*
>
>
> *Stripe Statistics:*
>
>
> *File Statistics:*
>
> *  Column 0: count: 0 hasNull: false*
>
>
> *Stripes:*
>
>
> *File length: 49 bytes*
>
> *Padding length: 0 bytes*
>
> *Padding ratio: 0%*
>
> *$ *
>
>
> I still not able to figure it out whats causing this odd behaviour?
>
>
> Regards
> Biswa
>
> On Thu, Mar 10, 2016 at 3:12 PM, Prasanth Jayachandran <
> pjayachandran@hortonworks.com> wrote:
>
>> Alternatively you can send orcfiledump output for the empty orc file from
>> broken partition.
>>
>> Thanks
>> Prasanth
>>
>> On Mar 10, 2016, at 5:11 PM, Prasanth Jayachandran <
>> pjayachandran@hortonworks.com> wrote:
>>
>> Could you attach the emtpy orc files from one of the broken partition
>> somewhere? I can run some tests on it to see why its happening.
>>
>> Thanks
>> Prasanth
>>
>> On Mar 8, 2016, at 12:02 AM, Biswajit Nayak <bi...@altiscale.com>
>> wrote:
>>
>> Both the parameters are set to false by default.
>>
>> *hive> set hive.optimize.index.filter;*
>> *hive.optimize.index.filter=false*
>> *hive> set hive.orc.splits.include.file.footer;*
>> *hive.orc.splits.include.file.footer=false*
>> *hive> *
>>
>> >>>I suspect this might be related to having 0 row files in the buckets
>> not
>> having any recorded schema.
>>
>> yes there are few files with 0 row, but the query works with other
>> partition (which has 0 row files). Out of 30 partition (for a month), 3-4
>> partition are having this issue. Even reload of the data does not yield
>> anything. Query works fine in MR now, but having issue in tez.
>>
>>
>>
>> On Tue, Mar 8, 2016 at 2:43 AM, Gopal Vijayaraghavan <go...@apache.org>
>> wrote:
>>
>>>
>>> > c                varchar(2)
>>> ...
>>> > Num Buckets:         7
>>>
>>> I suspect this might be related to having 0 row files in the buckets not
>>> having any recorded schema.
>>>
>>> You can also experiment with hive.optimize.index.filter=false, to see if
>>> the zero row case is artificially produced via predicate push-down.
>>>
>>>
>>> That shouldn't be a problem unless you've turned on
>>> hive.orc.splits.include.file.footer=true (recommended to be false).
>>>
>>> Your row-locations don't actually match any Apache source jar in my
>>> builds, are there any other patches to consider?
>>>
>>> Cheers,
>>> Gopal
>>>
>>>
>>>
>>
>>
>>
>

Re: Hive Cli ORC table read error with limit option

Posted by Biswajit Nayak <bi...@altiscale.com>.

Thanks Prasanth for the update. I will test it and update it here the
outcome.

Thanks
Biswa

On Tue, Apr 19, 2016 at 6:26 AM, Prasanth Jayachandran <
pjayachandran@hortonworks.com> wrote:

> Hi Biswajit
>
> You might need patch from https://issues.apache.org/jira/browse/HIVE-11546
>
> Can you apply this patch to your hive build and see if it solves the
> issue? (recommended)
>
> Alternatively, you can use “hive.exec.orc.split.strategy”=“BI” as
> workaround.
> Its highly not recommended to use this config as it will disable split
> elimination
> and may generate sub-optiomal splits resulting in less map-side
> parallelism.
> This config is just provided as an workaround and is suitable when all orc
> files
> are small (<less than stripe size or block size).
>
> Thanks
> Prasanth
>
>
> On Apr 18, 2016, at 7:44 PM, Biswajit Nayak <bi...@altiscale.com>
> wrote:
>
> Hi All,
>
> I seriously need help on this aspect. Any reference or pointer to
> troubleshoot or fix this, could be helpful.
>
> Regards
> Biswa
>
> On Fri, Mar 25, 2016 at 11:24 PM, Biswajit Nayak <bi...@altiscale.com>
> wrote:
>
>> Prashanth,
>>
>> Apologies for the delay in response.
>>
>> Below is the orcfiledump of the empty orc file from a broken partition.
>>
>> *$ hive --orcfiledump /hive/*testdb*.db/*table_orc
>> */year=2016/month=1/day=29/000000_0*
>> *Structure for  /hive/*testdb*.db/*table_orc
>> */year=2016/month=1/day=29/000000_0*
>> *File Version: 0.12 with HIVE_8732*
>> *16/03/25 17:49:09 INFO orc.ReaderImpl: Reading ORC rows from  /hive/*
>> testdb*.db/*table_orc*/year=2016/month=1/day=29/000000_0 with {include:
>> null, offset: 0, length: 9223372036854775807}*
>> *16/03/25 17:49:09 INFO orc.RecordReaderFactory: Schema is not specified
>> on read. Using file schema.*
>> *Rows: 0*
>> *Compression: SNAPPY*
>> *Compression size: 262144*
>> *Type: struct<>*
>>
>> *Stripe Statistics:*
>>
>> *File Statistics:*
>> *  Column 0: count: 0 hasNull: false*
>>
>> *Stripes:*
>>
>> *File length: 49 bytes*
>> *Padding length: 0 bytes*
>> *Padding ratio: 0%*
>> *$ *
>>
>>
>> I still not able to figure it out whats causing this odd behaviour?
>>
>>
>> Regards
>> Biswa
>>
>> On Thu, Mar 10, 2016 at 3:12 PM, Prasanth Jayachandran <
>> pjayachandran@hortonworks.com> wrote:
>>
>>> Alternatively you can send orcfiledump output for the empty orc file
>>> from broken partition.
>>>
>>> Thanks
>>> Prasanth
>>>
>>> On Mar 10, 2016, at 5:11 PM, Prasanth Jayachandran <
>>> pjayachandran@hortonworks.com> wrote:
>>>
>>> Could you attach the emtpy orc files from one of the broken partition
>>> somewhere? I can run some tests on it to see why its happening.
>>>
>>> Thanks
>>> Prasanth
>>>
>>> On Mar 8, 2016, at 12:02 AM, Biswajit Nayak <bi...@altiscale.com>
>>> wrote:
>>>
>>> Both the parameters are set to false by default.
>>>
>>> *hive> set hive.optimize.index.filter;*
>>> *hive.optimize.index.filter=false*
>>> *hive> set hive.orc.splits.include.file.footer;*
>>> *hive.orc.splits.include.file.footer=false*
>>> *hive> *
>>>
>>> >>>I suspect this might be related to having 0 row files in the buckets
>>> not
>>> having any recorded schema.
>>>
>>> yes there are few files with 0 row, but the query works with other
>>> partition (which has 0 row files). Out of 30 partition (for a month), 3-4
>>> partition are having this issue. Even reload of the data does not yield
>>> anything. Query works fine in MR now, but having issue in tez.
>>>
>>>
>>>
>>> On Tue, Mar 8, 2016 at 2:43 AM, Gopal Vijayaraghavan <go...@apache.org>
>>> wrote:
>>>
>>>>
>>>> > c                varchar(2)
>>>> ...
>>>> > Num Buckets:         7
>>>>
>>>> I suspect this might be related to having 0 row files in the buckets not
>>>> having any recorded schema.
>>>>
>>>> You can also experiment with hive.optimize.index.filter=false, to see if
>>>> the zero row case is artificially produced via predicate push-down.
>>>>
>>>>
>>>> That shouldn't be a problem unless you've turned on
>>>> hive.orc.splits.include.file.footer=true (recommended to be false).
>>>>
>>>> Your row-locations don't actually match any Apache source jar in my
>>>> builds, are there any other patches to consider?
>>>>
>>>> Cheers,
>>>> Gopal
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>
>

Re: Hive Cli ORC table read error with limit option

Posted by Prasanth Jayachandran <pj...@hortonworks.com>.

Hi Biswajit

You might need patch from https://issues.apache.org/jira/browse/HIVE-11546

Can you apply this patch to your hive build and see if it solves the issue? (recommended)

Alternatively, you can use “hive.exec.orc.split.strategy”=“BI” as workaround.
Its highly not recommended to use this config as it will disable split elimination
and may generate sub-optiomal splits resulting in less map-side parallelism.
This config is just provided as an workaround and is suitable when all orc files
are small (<less than stripe size or block size).

Thanks
Prasanth

On Apr 18, 2016, at 7:44 PM, Biswajit Nayak <bi...@altiscale.com>> wrote:

Hi All,

I seriously need help on this aspect. Any reference or pointer to troubleshoot or fix this, could be helpful.

Regards
Biswa

On Fri, Mar 25, 2016 at 11:24 PM, Biswajit Nayak <bi...@altiscale.com>> wrote:
Prashanth,

Apologies for the delay in response.

Below is the orcfiledump of the empty orc file from a broken partition.

$ hive --orcfiledump /hive/testdb.db/table_orc/year=2016/month=1/day=29/000000_0
Structure for  /hive/testdb.db/table_orc/year=2016/month=1/day=29/000000_0
File Version: 0.12 with HIVE_8732
16/03/25 17:49:09 INFO orc.ReaderImpl: Reading ORC rows from  /hive/testdb.db/table_orc/year=2016/month=1/day=29/000000_0 with {include: null, offset: 0, length: 9223372036854775807}
16/03/25 17:49:09 INFO orc.RecordReaderFactory: Schema is not specified on read. Using file schema.
Rows: 0
Compression: SNAPPY
Compression size: 262144
Type: struct<>

Stripe Statistics:

File Statistics:
  Column 0: count: 0 hasNull: false

Stripes:

File length: 49 bytes
Padding length: 0 bytes
Padding ratio: 0%
$

I still not able to figure it out whats causing this odd behaviour?

Regards
Biswa

On Thu, Mar 10, 2016 at 3:12 PM, Prasanth Jayachandran <pj...@hortonworks.com>> wrote:
Alternatively you can send orcfiledump output for the empty orc file from broken partition.

Thanks
Prasanth

On Mar 10, 2016, at 5:11 PM, Prasanth Jayachandran <pj...@hortonworks.com>> wrote:

Could you attach the emtpy orc files from one of the broken partition somewhere? I can run some tests on it to see why its happening.

Thanks
Prasanth

On Mar 8, 2016, at 12:02 AM, Biswajit Nayak <bi...@altiscale.com>> wrote:

Both the parameters are set to false by default.

hive> set hive.optimize.index.filter;
hive.optimize.index.filter=false
hive> set hive.orc.splits.include.file.footer;
hive.orc.splits.include.file.footer=false
hive>

>>>I suspect this might be related to having 0 row files in the buckets not
having any recorded schema.

yes there are few files with 0 row, but the query works with other partition (which has 0 row files). Out of 30 partition (for a month), 3-4 partition are having this issue. Even reload of the data does not yield anything. Query works fine in MR now, but having issue in tez.

On Tue, Mar 8, 2016 at 2:43 AM, Gopal Vijayaraghavan <go...@apache.org>> wrote:

> c                varchar(2)
...
> Num Buckets:         7

I suspect this might be related to having 0 row files in the buckets not
having any recorded schema.

You can also experiment with hive.optimize.index.filter=false, to see if
the zero row case is artificially produced via predicate push-down.

That shouldn't be a problem unless you've turned on
hive.orc.splits.include.file.footer=true (recommended to be false).

Your row-locations don't actually match any Apache source jar in my
builds, are there any other patches to consider?

Cheers,
Gopal