You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Mohammad Islam <mi...@yahoo.com.INVALID> on 2015/10/15 03:10:45 UTC

requestedSchema vs fileSchema

Hi,

Before I investigate deep into the code, it will be really helpful if someone can help me on this issue.

My main question is : if both schemas are provided, the parquet file reader uses which schema.
For example, I provide requestedSchema and readSupportMetadata in ReadContext. Looks like, parquet reader is using  the requested schema to read the file. 

In my case, my request schema has a different data type for a column compared to file schema. For example, one field type is "int" in file schema but "bigint" in requested schema.  I got this exception.


"Failed with exception java.io.IOException:parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file file:/tmp/my/my_year.pq". But if I use the "int" in my requestedSchema, it works fine.

Any quick help or pointer is highly appreciated.

Regards,
Mohammad

Re: requestedSchema vs fileSchema

Posted by Cheng Lian <li...@gmail.com>.
Actually requested schema is not necessary to be a subset of the file 
schema. If a field in the requested schema doesn't exist in the file 
schema, Parquet fills that field with nulls, as long as the field is 
optional.

Cheng

On 10/14/15 6:25 PM, Alex Levenson wrote:
> It's always a cooperation between the two.
>
> file schema is how the file was written.
>
> requested schema is what subset of the file schema you want to read. So
> requested schema must always be compatible (same types) and be a subset of
> the file schema.
>
> On Wed, Oct 14, 2015 at 6:10 PM, Mohammad Islam <mi...@yahoo.com.invalid>
> wrote:
>
>> Hi,
>>
>> Before I investigate deep into the code, it will be really helpful if
>> someone can help me on this issue.
>>
>> My main question is : if both schemas are provided, the parquet file
>> reader uses which schema.
>> For example, I provide requestedSchema and readSupportMetadata in
>> ReadContext. Looks like, parquet reader is using  the requested schema to
>> read the file.
>>
>> In my case, my request schema has a different data type for a column
>> compared to file schema. For example, one field type is "int" in file
>> schema but "bigint" in requested schema.  I got this exception.
>>
>>
>> "Failed with exception
>> java.io.IOException:parquet.io.ParquetDecodingException: Can not read value
>> at 0 in block -1 in file file:/tmp/my/my_year.pq". But if I use the "int"
>> in my requestedSchema, it works fine.
>>
>> Any quick help or pointer is highly appreciated.
>>
>> Regards,
>> Mohammad
>>
>
>


requestedSchema vs fileSchema

Posted by Mohammad Islam <mi...@yahoo.com.INVALID>.
Hi,

Before I investigate deep into the code, it will be really helpful if someone can help me on this issue.

My main question is : if both schemas are provided, the parquet file reader uses which schema.
For example, I provide requestedSchema and readSupportMetadata in ReadContext. Looks like, parquet reader is using  the requested schema to read the file. 

In my case, my request schema has a different data type for a column compared to file schema. For example, one field type is "int" in file schema but "bigint" in requested schema.  I got this exception.


"Failed with exception java.io.IOException:parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file file:/tmp/my/my_year.pq". But if I use the "int" in my requestedSchema, it works fine.

Any quick help or pointer is highly appreciated.

Regards,
Mohammad