You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Micah Kornfield <em...@gmail.com> on 2023/03/01 03:09:25 UTC

Re: Parquet Null logical type question

It is a validation bug that you can read and write values to the column.

My understanding of the use-case for the type is coming from more loosely
typed systems that infer schemas on the fly and then write in the parquet.
In these systems if a column contains all Null values then the actual type
cannot be inferred, so Null logical type would in theory allow for schema
evolution afterwards if/when the actual type is discovered when writing
more data to different files.  I believe one example of where this is used
is when writing out Pandas dataframes to parquet.

Cheers,
Micah



On Tue, Feb 28, 2023 at 4:07 PM Jerry Adair <Je...@sas.com.invalid>
wrote:

> Hi,
>
> I am just learning of the Parquet Null logical type.  I've read the
> documentation, as well as the brief inline commentary in the types header.
> That states that the Null logical type can annotate any primitive type.
> What I find confusing is that if I create a Parquet table with a primitive
> type, say Int32 for example, and then assign it the Null logical type, I
> can still write and then read values from that column.  This leads me to a
> more general question: what is the typical use case scenario for a Null
> logical type?  And how is it supposed to work and intended to be used?
>
> Thanks!
>

RE: Parquet Null logical type question

Posted by Jerry Adair <Je...@sas.com.INVALID>.
Hi Micah,

Alright, I understand this.  Thank you for the feedback, it is of great help!

By the way, do you know much about the Azure (ADLS) development work in the Arrow C++ filesystem library?  I have posted a couple of times to inquire about it, but have heard nothing in response.  We have a strong need to utilize this class (like we already utilize the GCS and AWS classes in the filesystem library).  I read that it has been developed but not fully integrated into Arrow.  It seems difficult to get information.  I inquired with Weston Pace but didn't hear anything from him either.  And so I thought I'd see if you knew anything.

Ok thanks again, I appreciate the help!

Jerry


-----Original Message-----
From: Micah Kornfield <em...@gmail.com> 
Sent: Tuesday, February 28, 2023 9:09 PM
To: dev@parquet.apache.org
Subject: Re: Parquet Null logical type question

EXTERNAL

It is a validation bug that you can read and write values to the column.

My understanding of the use-case for the type is coming from more loosely typed systems that infer schemas on the fly and then write in the parquet.
In these systems if a column contains all Null values then the actual type cannot be inferred, so Null logical type would in theory allow for schema evolution afterwards if/when the actual type is discovered when writing more data to different files.  I believe one example of where this is used is when writing out Pandas dataframes to parquet.

Cheers,
Micah



On Tue, Feb 28, 2023 at 4:07 PM Jerry Adair <Je...@sas.com.invalid>
wrote:

> Hi,
>
> I am just learning of the Parquet Null logical type.  I've read the 
> documentation, as well as the brief inline commentary in the types header.
> That states that the Null logical type can annotate any primitive type.
> What I find confusing is that if I create a Parquet table with a 
> primitive type, say Int32 for example, and then assign it the Null 
> logical type, I can still write and then read values from that column.  
> This leads me to a more general question: what is the typical use case 
> scenario for a Null logical type?  And how is it supposed to work and intended to be used?
>
> Thanks!
>