You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Micah Kornfield <em...@gmail.com> on 2023/03/01 03:09:25 UTC
Re: Parquet Null logical type question
It is a validation bug that you can read and write values to the column.
My understanding of the use-case for the type is coming from more loosely
typed systems that infer schemas on the fly and then write in the parquet.
In these systems if a column contains all Null values then the actual type
cannot be inferred, so Null logical type would in theory allow for schema
evolution afterwards if/when the actual type is discovered when writing
more data to different files. I believe one example of where this is used
is when writing out Pandas dataframes to parquet.
Cheers,
Micah
On Tue, Feb 28, 2023 at 4:07 PM Jerry Adair <Je...@sas.com.invalid>
wrote:
> Hi,
>
> I am just learning of the Parquet Null logical type. I've read the
> documentation, as well as the brief inline commentary in the types header.
> That states that the Null logical type can annotate any primitive type.
> What I find confusing is that if I create a Parquet table with a primitive
> type, say Int32 for example, and then assign it the Null logical type, I
> can still write and then read values from that column. This leads me to a
> more general question: what is the typical use case scenario for a Null
> logical type? And how is it supposed to work and intended to be used?
>
> Thanks!
>
RE: Parquet Null logical type question
Posted by Jerry Adair <Je...@sas.com.INVALID>.
Hi Micah,
Alright, I understand this. Thank you for the feedback, it is of great help!
By the way, do you know much about the Azure (ADLS) development work in the Arrow C++ filesystem library? I have posted a couple of times to inquire about it, but have heard nothing in response. We have a strong need to utilize this class (like we already utilize the GCS and AWS classes in the filesystem library). I read that it has been developed but not fully integrated into Arrow. It seems difficult to get information. I inquired with Weston Pace but didn't hear anything from him either. And so I thought I'd see if you knew anything.
Ok thanks again, I appreciate the help!
Jerry
-----Original Message-----
From: Micah Kornfield <em...@gmail.com>
Sent: Tuesday, February 28, 2023 9:09 PM
To: dev@parquet.apache.org
Subject: Re: Parquet Null logical type question
EXTERNAL
It is a validation bug that you can read and write values to the column.
My understanding of the use-case for the type is coming from more loosely typed systems that infer schemas on the fly and then write in the parquet.
In these systems if a column contains all Null values then the actual type cannot be inferred, so Null logical type would in theory allow for schema evolution afterwards if/when the actual type is discovered when writing more data to different files. I believe one example of where this is used is when writing out Pandas dataframes to parquet.
Cheers,
Micah
On Tue, Feb 28, 2023 at 4:07 PM Jerry Adair <Je...@sas.com.invalid>
wrote:
> Hi,
>
> I am just learning of the Parquet Null logical type. I've read the
> documentation, as well as the brief inline commentary in the types header.
> That states that the Null logical type can annotate any primitive type.
> What I find confusing is that if I create a Parquet table with a
> primitive type, say Int32 for example, and then assign it the Null
> logical type, I can still write and then read values from that column.
> This leads me to a more general question: what is the typical use case
> scenario for a Null logical type? And how is it supposed to work and intended to be used?
>
> Thanks!
>