You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by Jan Van Besien <ja...@gmail.com> on 2015/01/16 11:50:03 UTC

NULL versus empty arrays

Hi,

In the context of "store nulls", Phoenix seems to store empty arrays
and null arrays both as an empty byte array. We have a use case where
null means something different than empty.

I had a quick look at how arrays are serialized. The serialization
format starts by writing out the length of the array, hence I think it
is relatively easy to change the serialization of empty arrays into "a
byte array that represents an empty array by storing length=0" in
stead of "an empty byte array".

I can go ahead and provide patches, but I am wondering whether maybe
there is a reason why phoenix would not want to make the distinction
between null and empty?

This might also apply to strings?

Thanks
Jan

Re: NULL versus empty arrays

Posted by Jan Van Besien <ja...@gmail.com>.
On Fri, Jan 16, 2015 at 1:16 PM, Gabriel Reid <ga...@gmail.com> wrote:
> Just taking a look at this as well -- I guess another way of putting
> it is that empty arrays don't (yet) exist in Phoenix (right?)

yes indeed.

> My guess would be that it's more the fact that empty arrays don't
> exist (if my assumption about that is correct), and then I guess it's
> just less serialization overhead to store nothing than to store an
> "empty" marker.
>
> I guess if the concept of empty arrays were to be introduced (by
> storing them explicitly), the potential for backwards-compatibility
> issues would be pretty minimal.

Given that the current serialization format for arrays does not
include a length value (if the base type is fixed width), I don't see
how we can write an "empty" marker without changing the serialization
format, thereby introducing a backwards incompatible change. I think
this is exactly for the same reason as why you say it is not possible
for varchar.

Note that I first thought there already is a length value in the array
serialization format, but if I am not mistaken that is only true if
the array base type is variable length. That might have added some
unnecessary confusion to the discussion..

What could probably be done is to introduce a new "array that supports
empty arrays" type, but that doesn't sound like a decent workaround
either.

Jan

Re: NULL versus empty arrays

Posted by Gabriel Reid <ga...@gmail.com>.
Inlined below

On Fri, Jan 16, 2015 at 11:50 AM, Jan Van Besien <ja...@gmail.com> wrote:
> Hi,
>
> In the context of "store nulls", Phoenix seems to store empty arrays
> and null arrays both as an empty byte array. We have a use case where
> null means something different than empty.
>
> I had a quick look at how arrays are serialized. The serialization
> format starts by writing out the length of the array, hence I think it
> is relatively easy to change the serialization of empty arrays into "a
> byte array that represents an empty array by storing length=0" in
> stead of "an empty byte array".

Just taking a look at this as well -- I guess another way of putting
it is that empty arrays don't (yet) exist in Phoenix (right?)

>
> I can go ahead and provide patches, but I am wondering whether maybe
> there is a reason why phoenix would not want to make the distinction
> between null and empty?

My guess would be that it's more the fact that empty arrays don't
exist (if my assumption about that is correct), and then I guess it's
just less serialization overhead to store nothing than to store an
"empty" marker.

I guess if the concept of empty arrays were to be introduced (by
storing them explicitly), the potential for backwards-compatibility
issues would be pretty minimal. Code that was doing something like
this to set an array column to null:

   stmt.setArray(1, conn.createArrayOf("INTEGER", new Object[]{}));

instead of doing this:

   stmt.setNull(1)

would stop working as it does right now, but that seems like a pretty
far-off edge case.

If my assumptions are all correct here, the question becomes more of:
do we want to introduce empty arrays or not? I don't see a reason
necessarily not to do it, although maybe someone else does?

>
> This might also apply to strings?

I think strings (varchar) is a different case -- the non-existence of
empty strings is in line with what Oracle does, and this would require
changing the actual serialization of varchar columns as well (i.e.
adding a length value to the serialization).

- Gabriel

Re: NULL versus empty arrays

Posted by Jan Van Besien <ja...@gmail.com>.
On Fri, Jan 16, 2015 at 11:50 AM, Jan Van Besien <ja...@gmail.com> wrote:
> I had a quick look at how arrays are serialized. The serialization
> format starts by writing out the length of the array

Upon closer inspection, it does this only if the baseType is a
variable length type. So a straightforward fix would introduce a
binary incompatible change in the storage format, at least for arrays
of which the baseType is fixed length.

Jan