You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by Piyush Mukati <pi...@gmail.com> on 2017/12/18 13:38:18 UTC

getting read past EOF for Double column

Hi,
I have written one orc file with map-reduce job. But while reading the file
I am getting "read past EOF for a double column".
After debugging I found that we are trying to read an empty stream. I am
suspecting the file meta to be corrupt.

as the column meta says:
*Column 36: count: 0 hasNull: false sum: 0.0*
I am not able to understand how hasNull=false and count can be zero.
while other columns have non zero counts.

I am out of ideas on debugging.  Please help me with the direction I should
debug  further.
please find attached meta and the stackTarace.
Thanks.

Re: getting read past EOF for Double column

Posted by Owen O'Malley <ow...@gmail.com>.
I've filed this as https://issues.apache.org/jira/browse/ORC-285 . Sorry
for the delay in getting the fix out.

.. Owen

On Mon, Dec 18, 2017 at 10:27 AM, Owen O'Malley <ow...@gmail.com>
wrote:

> This is a bug. Please file a jira. It looks like a change went in that
> made the DoubleTreeReader fail if it is called on a batch of size 0.
>
> Thanks,
>    Owen
>
> On Mon, Dec 18, 2017 at 10:19 AM, Owen O'Malley <ow...@gmail.com>
> wrote:
>
>> Actually, the metadata is reasonable, it is just that there is an array
>> above that column that doesn't have any elements.
>>
>> So the tree down to column 36 looks like:
>>
>> column 0: (struct) count: 42692
>> column 1: data (struct) count: 42692
>> column 21: listingAssociated (array) count: 42692
>> column 22: (struct) count: 0
>> column 32: sla (array) count: 0
>> column 33: (struct) count: 0
>> column 34: shippingTier (struct) count: 0
>> column 35: charge (struct) count: 0
>> column 36: amount (double) count: 0
>>
>> since there are 0 instances of column 22, there aren't any instances
>> below that. So what should be happening is that the reader doesn't call
>> down to read the data because there are no values.
>>
>> Which version of ORC are you using to read with?
>>
>> Thanks,
>>    Owen
>>
>>
>> On Mon, Dec 18, 2017 at 5:38 AM, Piyush Mukati <pi...@gmail.com>
>> wrote:
>>
>>> Hi,
>>> I have written one orc file with map-reduce job. But while reading the
>>> file I am getting "read past EOF for a double column".
>>> After debugging I found that we are trying to read an empty stream. I am
>>> suspecting the file meta to be corrupt.
>>>
>>> as the column meta says:
>>> *Column 36: count: 0 hasNull: false sum: 0.0*
>>> I am not able to understand how hasNull=false and count can be zero.
>>> while other columns have non zero counts.
>>>
>>> I am out of ideas on debugging.  Please help me with the direction I
>>> should debug  further.
>>> please find attached meta and the stackTarace.
>>> Thanks.
>>>
>>
>>
>

Re: getting read past EOF for Double column

Posted by Owen O'Malley <ow...@gmail.com>.
This is a bug. Please file a jira. It looks like a change went in that made
the DoubleTreeReader fail if it is called on a batch of size 0.

Thanks,
   Owen

On Mon, Dec 18, 2017 at 10:19 AM, Owen O'Malley <ow...@gmail.com>
wrote:

> Actually, the metadata is reasonable, it is just that there is an array
> above that column that doesn't have any elements.
>
> So the tree down to column 36 looks like:
>
> column 0: (struct) count: 42692
> column 1: data (struct) count: 42692
> column 21: listingAssociated (array) count: 42692
> column 22: (struct) count: 0
> column 32: sla (array) count: 0
> column 33: (struct) count: 0
> column 34: shippingTier (struct) count: 0
> column 35: charge (struct) count: 0
> column 36: amount (double) count: 0
>
> since there are 0 instances of column 22, there aren't any instances below
> that. So what should be happening is that the reader doesn't call down to
> read the data because there are no values.
>
> Which version of ORC are you using to read with?
>
> Thanks,
>    Owen
>
>
> On Mon, Dec 18, 2017 at 5:38 AM, Piyush Mukati <pi...@gmail.com>
> wrote:
>
>> Hi,
>> I have written one orc file with map-reduce job. But while reading the
>> file I am getting "read past EOF for a double column".
>> After debugging I found that we are trying to read an empty stream. I am
>> suspecting the file meta to be corrupt.
>>
>> as the column meta says:
>> *Column 36: count: 0 hasNull: false sum: 0.0*
>> I am not able to understand how hasNull=false and count can be zero.
>> while other columns have non zero counts.
>>
>> I am out of ideas on debugging.  Please help me with the direction I
>> should debug  further.
>> please find attached meta and the stackTarace.
>> Thanks.
>>
>
>

Re: getting read past EOF for Double column

Posted by Owen O'Malley <ow...@gmail.com>.
Actually, the metadata is reasonable, it is just that there is an array
above that column that doesn't have any elements.

So the tree down to column 36 looks like:

column 0: (struct) count: 42692
column 1: data (struct) count: 42692
column 21: listingAssociated (array) count: 42692
column 22: (struct) count: 0
column 32: sla (array) count: 0
column 33: (struct) count: 0
column 34: shippingTier (struct) count: 0
column 35: charge (struct) count: 0
column 36: amount (double) count: 0

since there are 0 instances of column 22, there aren't any instances below
that. So what should be happening is that the reader doesn't call down to
read the data because there are no values.

Which version of ORC are you using to read with?

Thanks,
   Owen


On Mon, Dec 18, 2017 at 5:38 AM, Piyush Mukati <pi...@gmail.com>
wrote:

> Hi,
> I have written one orc file with map-reduce job. But while reading the
> file I am getting "read past EOF for a double column".
> After debugging I found that we are trying to read an empty stream. I am
> suspecting the file meta to be corrupt.
>
> as the column meta says:
> *Column 36: count: 0 hasNull: false sum: 0.0*
> I am not able to understand how hasNull=false and count can be zero.
> while other columns have non zero counts.
>
> I am out of ideas on debugging.  Please help me with the direction I
> should debug  further.
> please find attached meta and the stackTarace.
> Thanks.
>