You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Eric Erhardt <Er...@microsoft.com.INVALID> on 2020/08/11 18:34:17 UTC

RE: [EXTERNAL] Re: Value of Date64 type over Date32

Thanks for the info, Wes.

Looking through the Java implementation, I don't see any validation that "where the values are evenly divisible by 86400000" is enforced in DateMilliVector. We are having a conversation on the C# implementation whether we should allow values that are not evenly divisible by 86400000. 

https://github.com/apache/arrow/pull/7654#discussion_r463886892

I'm wondering if C# should allow any values in Date64, or if it should force/coerce the values to be divisible by 86400000.

It doesn't look to me that C++ or Java have these enforcements. How do other languages handle this?

Eric

-----Original Message-----
From: Wes McKinney <we...@gmail.com> 
Sent: Tuesday, August 11, 2020 12:18 PM
To: dev <de...@arrow.apache.org>
Subject: [EXTERNAL] Re: Value of Date64 type over Date32

On Mon, Aug 10, 2020 at 6:19 PM Eric Erhardt <Er...@microsoft.com.invalid> wrote:
>
> I don't understand what the value of the Date64 type is over using Date32:
>
> From 
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> ub.com%2Fapache%2Farrow%2Fblob%2Fmaster%2Fformat%2FSchema.fbs%23L193-L
> 206&amp;data=02%7C01%7CEric.Erhardt%40microsoft.com%7Cc8a2cc1d706349ab
> 0d5408d83e1a9fb4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63732763
> 1350456279&amp;sdata=AzQj1SEjvsIcoMSbGTFi1rubuJyoL955zcpEvRLSKWg%3D&am
> p;reserved=0
>
> enum DateUnit: short {
>   DAY,
>   MILLISECOND
> }
>
> /// Date is either a 32-bit or 64-bit type representing elapsed time 
> since UNIX /// epoch (1970-01-01), stored in either of two units:
> ///
> /// * Milliseconds (64 bits) indicating UNIX time elapsed since the epoch (no
> ///   leap seconds), where the values are evenly divisible by 86400000
> /// * Days (32 bits) since the UNIX epoch table Date {
>   unit: DateUnit = MILLISECOND;
> }
>
> If the spec specifies that Date64 must be evenly divisible by 86400000, I don't see the point in using millisecond units. I can't represent any different information in my data. So why would I take up double the space to represent the same information?
>
> Can someone explain when Date64 is useful?

As I recall the motivation of the date64 type is to allow for zero-copy of dates-as-milliseconds, which are used in some other libraries / platforms. For example Joda in uses a millisecond-based "instant". I'm not sure which others do off hand.

That said, it would be perfectly reasonable for a data processing system to use date32 throughout and convert any date64 data to date32 if desired.

> Eric

Re: [EXTERNAL] Re: Value of Date64 type over Date32

Posted by Wes McKinney <we...@gmail.com>.
I think we should validate optionally in ValidateFull in C++. I think
to validate unconditionally would be too computationally expensive

https://issues.apache.org/jira/browse/ARROW-9705

On Tue, Aug 11, 2020 at 1:34 PM Eric Erhardt
<Er...@microsoft.com.invalid> wrote:
>
> Thanks for the info, Wes.
>
> Looking through the Java implementation, I don't see any validation that "where the values are evenly divisible by 86400000" is enforced in DateMilliVector. We are having a conversation on the C# implementation whether we should allow values that are not evenly divisible by 86400000.
>
> https://github.com/apache/arrow/pull/7654#discussion_r463886892
>
> I'm wondering if C# should allow any values in Date64, or if it should force/coerce the values to be divisible by 86400000.
>
> It doesn't look to me that C++ or Java have these enforcements. How do other languages handle this?
>
> Eric
>
> -----Original Message-----
> From: Wes McKinney <we...@gmail.com>
> Sent: Tuesday, August 11, 2020 12:18 PM
> To: dev <de...@arrow.apache.org>
> Subject: [EXTERNAL] Re: Value of Date64 type over Date32
>
> On Mon, Aug 10, 2020 at 6:19 PM Eric Erhardt <Er...@microsoft.com.invalid> wrote:
> >
> > I don't understand what the value of the Date64 type is over using Date32:
> >
> > From
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> > ub.com%2Fapache%2Farrow%2Fblob%2Fmaster%2Fformat%2FSchema.fbs%23L193-L
> > 206&amp;data=02%7C01%7CEric.Erhardt%40microsoft.com%7Cc8a2cc1d706349ab
> > 0d5408d83e1a9fb4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63732763
> > 1350456279&amp;sdata=AzQj1SEjvsIcoMSbGTFi1rubuJyoL955zcpEvRLSKWg%3D&am
> > p;reserved=0
> >
> > enum DateUnit: short {
> >   DAY,
> >   MILLISECOND
> > }
> >
> > /// Date is either a 32-bit or 64-bit type representing elapsed time
> > since UNIX /// epoch (1970-01-01), stored in either of two units:
> > ///
> > /// * Milliseconds (64 bits) indicating UNIX time elapsed since the epoch (no
> > ///   leap seconds), where the values are evenly divisible by 86400000
> > /// * Days (32 bits) since the UNIX epoch table Date {
> >   unit: DateUnit = MILLISECOND;
> > }
> >
> > If the spec specifies that Date64 must be evenly divisible by 86400000, I don't see the point in using millisecond units. I can't represent any different information in my data. So why would I take up double the space to represent the same information?
> >
> > Can someone explain when Date64 is useful?
>
> As I recall the motivation of the date64 type is to allow for zero-copy of dates-as-milliseconds, which are used in some other libraries / platforms. For example Joda in uses a millisecond-based "instant". I'm not sure which others do off hand.
>
> That said, it would be perfectly reasonable for a data processing system to use date32 throughout and convert any date64 data to date32 if desired.
>
> > Eric