You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Julien Le Dem <ju...@twitter.com.INVALID> on 2015/04/03 02:46:13 UTC
Re: A question about ParquetWriter schema
Maybe Ryan or Tom can help
On Wed, Mar 18, 2015 at 4:59 PM, Wei Yan <yw...@gmail.com> wrote:
> Hi, devs,
>
>
> I’m a newbie of using parquet. I met a ParquetWriter problem and wish
> anyone can help me.
>
>
> I use ParquetWriter to write a GenericRecord to a file. And the schema
> used to define the ParquetWriter has fewer fields than the GenericRecord.
> e.g., The schema for the ParquetWriter:
> {"type":"record","name":"r","fields":[{"name":"f1","type":"double","default":0}]}
> , which only has one field "f1”. And the GenericRecord has two fields:
> {"f2": null, "“"f1": 1.0}.
>
>
> When I use that ParquetWriter to write that record, I thought it would
> only write field “f1” and skip “f2”. However, I got this exception
> “Null-value for required field: f1”. It looks like the ParquetWriter
> considered the field sequence, and tried to match the “f2” in the record to
> the “f1” to the schema. Is this by design?
>
>
> Very appreciate for any help.
>
>
> thanks,
> Wei
>
Re: A question about ParquetWriter schema
Posted by Wei Yan <yw...@gmail.com>.
Thanks for the reply, Ryan.
Yes, you’re right. I try different schemas to create/read/write.
I met that issue because our data schema is evolving, and we have data constructed by different versions of schema. I’ll try to make the read/write schemas matched.
thanks,
Wei
On Thu, Apr 2, 2015 at 6:52 PM, Ryan Blue <bl...@cloudera.com> wrote:
> Hi Wei,
> It looks like you are using a writer with one schema to write records
> created with another. That doesn't work because Avro deconstructs
> generic records by position. Is there a way you could change your code
> so that you use the final write schema as the read schema for the file
> where you data comes from? You could also construct the records using
> the final schema, too. You just have to make sure that schema matches.
> What are you trying to do?
> rb
> On 04/02/2015 05:46 PM, Julien Le Dem wrote:
>> Maybe Ryan or Tom can help
>>
>> On Wed, Mar 18, 2015 at 4:59 PM, Wei Yan <ywskycn@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>> Hi, devs,
>>
>>
>> I’m a newbie of using parquet. I met a ParquetWriter problem and
>> wish anyone can help me.
>>
>>
>> I use ParquetWriter to write a GenericRecord to a file. And the
>> schema used to define the ParquetWriter has fewer fields than the
>> GenericRecord. e.g., The schema for the ParquetWriter:
>> {"type":"record","name":"r","fields":[{"name":"f1","type":"double","default":0}]}
>> , which only has one field "f1”. And the GenericRecord has two
>> fields: {"f2": null, "“"f1": 1.0}.
>>
>>
>> When I use that ParquetWriter to write that record, I thought it
>> would only write field “f1” and skip “f2”. However, I got this
>> exception “Null-value for required field: f1”. It looks like the
>> ParquetWriter considered the field sequence, and tried to match the
>> “f2” in the record to the “f1” to the schema. Is this by design?
>>
>>
>> Very appreciate for any help.
>>
>>
>> thanks,
>> Wei
>>
>>
> --
> Ryan Blue
> Software Engineer
> Cloudera, Inc.
Re: A question about ParquetWriter schema
Posted by Ryan Blue <bl...@cloudera.com>.
Hi Wei,
It looks like you are using a writer with one schema to write records
created with another. That doesn't work because Avro deconstructs
generic records by position. Is there a way you could change your code
so that you use the final write schema as the read schema for the file
where you data comes from? You could also construct the records using
the final schema, too. You just have to make sure that schema matches.
What are you trying to do?
rb
On 04/02/2015 05:46 PM, Julien Le Dem wrote:
> Maybe Ryan or Tom can help
>
> On Wed, Mar 18, 2015 at 4:59 PM, Wei Yan <ywskycn@gmail.com
> <ma...@gmail.com>> wrote:
>
> Hi, devs,
>
>
> I’m a newbie of using parquet. I met a ParquetWriter problem and
> wish anyone can help me.
>
>
> I use ParquetWriter to write a GenericRecord to a file. And the
> schema used to define the ParquetWriter has fewer fields than the
> GenericRecord. e.g., The schema for the ParquetWriter:
> {"type":"record","name":"r","fields":[{"name":"f1","type":"double","default":0}]}
> , which only has one field "f1”. And the GenericRecord has two
> fields: {"f2": null, "“"f1": 1.0}.
>
>
> When I use that ParquetWriter to write that record, I thought it
> would only write field “f1” and skip “f2”. However, I got this
> exception “Null-value for required field: f1”. It looks like the
> ParquetWriter considered the field sequence, and tried to match the
> “f2” in the record to the “f1” to the schema. Is this by design?
>
>
> Very appreciate for any help.
>
>
> thanks,
> Wei
>
>
--
Ryan Blue
Software Engineer
Cloudera, Inc.