You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Chris Nuernberger <ch...@techascent.com> on 2020/07/26 14:00:02 UTC

Validity masks and nullable

Hi, I have a question about the actual file format and how it is reflected
in the Java api.

1.  Are validity masks necessary of nullable is false?
2.  Does the java system reflect the implications of #1?  Can I create a
vector with a null validity mask?

Thanks again (and again and again) for you help :-).

Chris

Re: Validity masks and nullable

Posted by Chris Nuernberger <ch...@techascent.com>.
Makes sense, I buy that :-).  Thanks.

On Sun, Jul 26, 2020 at 10:38 AM Jacques Nadeau <ja...@apache.org> wrote:

> I think your first question is: can I skip the validity buffer if I know
> all values are defined.
>
> In the Java library, you cannot. This was a design choice to simplify
> implementations. The memory consumption difference is relatively small and
> collapsing the concepts was done to clean up code.
>
> Fun fact: This was done in the second design iteration of the Java library
> (the first one included support for this). We identified that many sources
> of data are actually all annotated as nullable but are mostly or are all
> non-null. Part of this is user laziness, part due to tools since they
> frequently don't support generating both types of data (writers of Parquet
> frequently do this, for example). As such, we found that wordwise
> operations against validity vectors that adapt processing code based on
> continuous sequences of nullable and non-nullable values was actually
> substantially more beneficial to generalized real-world workloads (while
> also simplifying the codebase).
>
> On Sun, Jul 26, 2020 at 7:00 AM Chris Nuernberger <ch...@techascent.com>
> wrote:
>
>> Hi, I have a question about the actual file format and how it is
>> reflected in the Java api.
>>
>> 1.  Are validity masks necessary of nullable is false?
>> 2.  Does the java system reflect the implications of #1?  Can I create a
>> vector with a null validity mask?
>>
>> Thanks again (and again and again) for you help :-).
>>
>> Chris
>>
>

Re: Validity masks and nullable

Posted by Jacques Nadeau <ja...@apache.org>.
I think your first question is: can I skip the validity buffer if I know
all values are defined.

In the Java library, you cannot. This was a design choice to simplify
implementations. The memory consumption difference is relatively small and
collapsing the concepts was done to clean up code.

Fun fact: This was done in the second design iteration of the Java library
(the first one included support for this). We identified that many sources
of data are actually all annotated as nullable but are mostly or are all
non-null. Part of this is user laziness, part due to tools since they
frequently don't support generating both types of data (writers of Parquet
frequently do this, for example). As such, we found that wordwise
operations against validity vectors that adapt processing code based on
continuous sequences of nullable and non-nullable values was actually
substantially more beneficial to generalized real-world workloads (while
also simplifying the codebase).

On Sun, Jul 26, 2020 at 7:00 AM Chris Nuernberger <ch...@techascent.com>
wrote:

> Hi, I have a question about the actual file format and how it is reflected
> in the Java api.
>
> 1.  Are validity masks necessary of nullable is false?
> 2.  Does the java system reflect the implications of #1?  Can I create a
> vector with a null validity mask?
>
> Thanks again (and again and again) for you help :-).
>
> Chris
>