You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Mohammad Islam <mi...@yahoo.com.INVALID> on 2015/10/15 03:10:45 UTC
requestedSchema vs fileSchema
Hi,
Before I investigate deep into the code, it will be really helpful if someone can help me on this issue.
My main question is : if both schemas are provided, the parquet file reader uses which schema.
For example, I provide requestedSchema and readSupportMetadata in ReadContext. Looks like, parquet reader is using the requested schema to read the file.
In my case, my request schema has a different data type for a column compared to file schema. For example, one field type is "int" in file schema but "bigint" in requested schema. I got this exception.
"Failed with exception java.io.IOException:parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file file:/tmp/my/my_year.pq". But if I use the "int" in my requestedSchema, it works fine.
Any quick help or pointer is highly appreciated.
Regards,
Mohammad
Re: requestedSchema vs fileSchema
Posted by Cheng Lian <li...@gmail.com>.
Actually requested schema is not necessary to be a subset of the file
schema. If a field in the requested schema doesn't exist in the file
schema, Parquet fills that field with nulls, as long as the field is
optional.
Cheng
On 10/14/15 6:25 PM, Alex Levenson wrote:
> It's always a cooperation between the two.
>
> file schema is how the file was written.
>
> requested schema is what subset of the file schema you want to read. So
> requested schema must always be compatible (same types) and be a subset of
> the file schema.
>
> On Wed, Oct 14, 2015 at 6:10 PM, Mohammad Islam <mi...@yahoo.com.invalid>
> wrote:
>
>> Hi,
>>
>> Before I investigate deep into the code, it will be really helpful if
>> someone can help me on this issue.
>>
>> My main question is : if both schemas are provided, the parquet file
>> reader uses which schema.
>> For example, I provide requestedSchema and readSupportMetadata in
>> ReadContext. Looks like, parquet reader is using the requested schema to
>> read the file.
>>
>> In my case, my request schema has a different data type for a column
>> compared to file schema. For example, one field type is "int" in file
>> schema but "bigint" in requested schema. I got this exception.
>>
>>
>> "Failed with exception
>> java.io.IOException:parquet.io.ParquetDecodingException: Can not read value
>> at 0 in block -1 in file file:/tmp/my/my_year.pq". But if I use the "int"
>> in my requestedSchema, it works fine.
>>
>> Any quick help or pointer is highly appreciated.
>>
>> Regards,
>> Mohammad
>>
>
>
requestedSchema vs fileSchema
Posted by Mohammad Islam <mi...@yahoo.com.INVALID>.
Hi,
Before I investigate deep into the code, it will be really helpful if someone can help me on this issue.
My main question is : if both schemas are provided, the parquet file reader uses which schema.
For example, I provide requestedSchema and readSupportMetadata in ReadContext. Looks like, parquet reader is using the requested schema to read the file.
In my case, my request schema has a different data type for a column compared to file schema. For example, one field type is "int" in file schema but "bigint" in requested schema. I got this exception.
"Failed with exception java.io.IOException:parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file file:/tmp/my/my_year.pq". But if I use the "int" in my requestedSchema, it works fine.
Any quick help or pointer is highly appreciated.
Regards,
Mohammad