You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Julien Le Dem <ju...@twitter.com.INVALID> on 2015/04/03 02:46:13 UTC

Re: A question about ParquetWriter schema

Maybe Ryan or Tom can help

On Wed, Mar 18, 2015 at 4:59 PM, Wei Yan <yw...@gmail.com> wrote:

> Hi, devs,
>
>
> I’m a newbie of using parquet. I met a ParquetWriter problem and wish
> anyone can help me.
>
>
> I use ParquetWriter to write a GenericRecord to a file. And the schema
> used to define the ParquetWriter has fewer fields than the GenericRecord.
> e.g., The schema for the ParquetWriter:
> {"type":"record","name":"r","fields":[{"name":"f1","type":"double","default":0}]}
> , which only has one field "f1”. And the GenericRecord has two fields:
> {"f2": null, "“"f1": 1.0}.
>
>
> When I use that ParquetWriter to write that record, I thought it would
> only write field “f1” and skip “f2”. However, I got this exception
> “Null-value for required field: f1”. It looks like the ParquetWriter
> considered the field sequence, and tried to match the “f2” in the record to
> the “f1” to the schema. Is this by design?
>
>
> Very appreciate for any help.
>
>
> thanks,
> Wei
>

Re: A question about ParquetWriter schema

Posted by Wei Yan <yw...@gmail.com>.
Thanks for the reply, Ryan.




Yes, you’re right. I try different schemas to create/read/write.

I met that issue because our data schema is evolving, and we have data constructed by different versions of schema. I’ll try to make the read/write schemas matched.




thanks,

Wei

On Thu, Apr 2, 2015 at 6:52 PM, Ryan Blue <bl...@cloudera.com> wrote:

> Hi Wei,
> It looks like you are using a writer with one schema to write records 
> created with another. That doesn't work because Avro deconstructs 
> generic records by position. Is there a way you could change your code 
> so that you use the final write schema as the read schema for the file 
> where you data comes from? You could also construct the records using 
> the final schema, too. You just have to make sure that schema matches.
> What are you trying to do?
> rb
> On 04/02/2015 05:46 PM, Julien Le Dem wrote:
>> Maybe Ryan or Tom can help
>>
>> On Wed, Mar 18, 2015 at 4:59 PM, Wei Yan <ywskycn@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>     Hi, devs,
>>
>>
>>     I’m a newbie of using parquet. I met a ParquetWriter problem and
>>     wish anyone can help me.
>>
>>
>>     I use ParquetWriter to write a GenericRecord to a file. And the
>>     schema used to define the ParquetWriter has fewer fields than the
>>     GenericRecord. e.g., The schema for the ParquetWriter:
>>     {"type":"record","name":"r","fields":[{"name":"f1","type":"double","default":0}]}
>>     , which only has one field "f1”. And the GenericRecord has two
>>     fields: {"f2": null, "“"f1": 1.0}.
>>
>>
>>     When I use that ParquetWriter to write that record, I thought it
>>     would only write field “f1” and skip “f2”. However, I got this
>>     exception “Null-value for required field: f1”. It looks like the
>>     ParquetWriter considered the field sequence, and tried to match the
>>     “f2” in the record to the “f1” to the schema. Is this by design?
>>
>>
>>     Very appreciate for any help.
>>
>>
>>     thanks,
>>     Wei
>>
>>
> -- 
> Ryan Blue
> Software Engineer
> Cloudera, Inc.

Re: A question about ParquetWriter schema

Posted by Ryan Blue <bl...@cloudera.com>.
Hi Wei,

It looks like you are using a writer with one schema to write records 
created with another. That doesn't work because Avro deconstructs 
generic records by position. Is there a way you could change your code 
so that you use the final write schema as the read schema for the file 
where you data comes from? You could also construct the records using 
the final schema, too. You just have to make sure that schema matches.

What are you trying to do?

rb

On 04/02/2015 05:46 PM, Julien Le Dem wrote:
> Maybe Ryan or Tom can help
>
> On Wed, Mar 18, 2015 at 4:59 PM, Wei Yan <ywskycn@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Hi, devs,
>
>
>     I’m a newbie of using parquet. I met a ParquetWriter problem and
>     wish anyone can help me.
>
>
>     I use ParquetWriter to write a GenericRecord to a file. And the
>     schema used to define the ParquetWriter has fewer fields than the
>     GenericRecord. e.g., The schema for the ParquetWriter:
>     {"type":"record","name":"r","fields":[{"name":"f1","type":"double","default":0}]}
>     , which only has one field "f1”. And the GenericRecord has two
>     fields: {"f2": null, "“"f1": 1.0}.
>
>
>     When I use that ParquetWriter to write that record, I thought it
>     would only write field “f1” and skip “f2”. However, I got this
>     exception “Null-value for required field: f1”. It looks like the
>     ParquetWriter considered the field sequence, and tried to match the
>     “f2” in the record to the “f1” to the schema. Is this by design?
>
>
>     Very appreciate for any help.
>
>
>     thanks,
>     Wei
>
>


-- 
Ryan Blue
Software Engineer
Cloudera, Inc.