You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by Jens Geyer <je...@apache.org> on 2022/06/04 10:39:32 UTC

date/time/timestamp, following ISO 8601

jiayuliu wrote:
 > Taking a step back, I wonder if we can standardize on a paved path
 > for adding newer standalone types in terms of requiredness/optional,
 >  plugin mechanism, and/or the level of language support, e.g. if I
 > want to add support for date/time/timestamp, following ISO 8601,
 > [https://en.wikipedia.org/wiki/ISO_8601,] is that necessarily a
 > good idea to be a standalone type?

Date and timestamps are a good topic a s well, although also a more 
complicated one. We discussed that shortly in the past (years ago I 
might add) and came to thne conclusion that because of the sheer 
plethora of different systems w/regard what systems expect to be a good 
date/time format onm the market it is quite hartd to come to one way 
that satisfies all sides:

Main thgings to consider:

  * what is the offset? i.e. what is date null?
  * what is the precision to store?
  * is there a (good) way to handle it available for each language?

I'm personally would be thankful if we have some good timestamp data 
type, but I also see the problems with it.

JensG

PS: What you mean by "in terms of requiredness/optional"?  How is that 
related?


Re: date/time/timestamp, following ISO 8601

Posted by Jiayu Liu <ji...@hey.com.INVALID>.
I still believe: using the convention used by Arrow, i.e., adding date32
and date64, where date32 for number of days past since UNIX epoch, and
date64 for number of milliseconds past since UNIX epoch, can be useful
without introducing ambiguity; for formatting using ISO 8601, users can
still fallback to string, for passing nanoseconds, users can still
leverage typedef of i64, etc.

If we have date32 and date64, the generator can generate unambiguous and
idiomatic types within major languages, for example, in Java, date32
maps from [1] and to [2] LocalDate, and date64 maps from [3] and to [4]
Instant.

[1]:
https://docs.oracle.com/javase/8/docs/api/java/time/LocalDate.html#toEpochDay--
[2]:
https://docs.oracle.com/javase/8/docs/api/java/time/LocalDate.html#ofEpochDay-
long-
[3]:
https://docs.oracle.com/javase/8/docs/api/java/time/Instant.html#toEpochMilli--
[4]:
https://docs.oracle.com/javase/8/docs/api/java/time/Instant.html#ofEpochMilli-
long-

On June 5, 2022, Jens Geyer <je...@apache.org> wrote:
> That being said, I would happily support it if we settle on an 
> agreement. Would be a good thing.
>
>
> Am 04.06.2022 um 12:39 schrieb Jens Geyer:
> > 
> > jiayuliu wrote:
> > > Taking a step back, I wonder if we can standardize on a paved path
> > > for adding newer standalone types in terms of
> requiredness/optional,
> > >  plugin mechanism, and/or the level of language support, e.g. if I
> > > want to add support for date/time/timestamp, following ISO 8601,
> > > [https://en.wikipedia.org/wiki/ISO_8601,] is that necessarily a
> > > good idea to be a standalone type?
> > 
> > Date and timestamps are a good topic a s well, although also a more 
> > complicated one. We discussed that shortly in the past (years ago I 
> > might add) and came to thne conclusion that because of the sheer 
> > plethora of different systems w/regard what systems expect to be a
> good 
> > date/time format onm the market it is quite hartd to come to one
> way 
> > that satisfies all sides:
> > 
> > Main thgings to consider:
> > 
> >  * what is the offset? i.e. what is date null?
> >  * what is the precision to store?
> >  * is there a (good) way to handle it available for each language?
> > 
> > I'm personally would be thankful if we have some good timestamp
> data 
> > type, but I also see the problems with it.
> > 
> > JensG
> > 
> > PS: What you mean by "in terms of requiredness/optional"?  How is
> that 
> > related?
> >

Re: date/time/timestamp, following ISO 8601

Posted by Jens Geyer <je...@apache.org>.
That being said, I would happily support it if we settle on an 
agreement. Would be a good thing.


Am 04.06.2022 um 12:39 schrieb Jens Geyer:
> 
> jiayuliu wrote:
>  > Taking a step back, I wonder if we can standardize on a paved path
>  > for adding newer standalone types in terms of requiredness/optional,
>  >  plugin mechanism, and/or the level of language support, e.g. if I
>  > want to add support for date/time/timestamp, following ISO 8601,
>  > [https://en.wikipedia.org/wiki/ISO_8601,] is that necessarily a
>  > good idea to be a standalone type?
> 
> Date and timestamps are a good topic a s well, although also a more 
> complicated one. We discussed that shortly in the past (years ago I 
> might add) and came to thne conclusion that because of the sheer 
> plethora of different systems w/regard what systems expect to be a good 
> date/time format onm the market it is quite hartd to come to one way 
> that satisfies all sides:
> 
> Main thgings to consider:
> 
>   * what is the offset? i.e. what is date null?
>   * what is the precision to store?
>   * is there a (good) way to handle it available for each language?
> 
> I'm personally would be thankful if we have some good timestamp data 
> type, but I also see the problems with it.
> 
> JensG
> 
> PS: What you mean by "in terms of requiredness/optional"?  How is that 
> related?
> 

Re: date/time/timestamp, following ISO 8601

Posted by Yuxuan Wang <yu...@reddit.com.INVALID>.
I think precision is a big issue for a timestamp type. I found an article
to discuss this issue across languages:
https://nickb.dev/blog/iso8601-and-nanosecond-precision-across-languages/

In our own experience, we just typedef i64 to TimestampMilliseconds for
milliseconds precision (example
<https://github.com/reddit/baseplate.py/blob/94b2fc3616c912cc558800e023be10968b41c104/baseplate/thrift/baseplate.thrift#L8>)
and that's good enough for most of our cases. If someone needs a higher
precision than milliseconds, they can just typedef another one.

An ISO-8601 timestamp type, in comparison, will infer much bigger overhead
when converting to/from string, and with hugely increased complicity to
deal with the variations of the string, and with no guarantee that the
precision represented in the string will be preserved in the language
library.

So overall I don't think an ISO-8601 timestamp type is a good idea.

A ISO-8601 date time might be ok, but its usage is much smaller compared to
timestamps, and I'm not sure if it's worth the effort.

On Sat, Jun 4, 2022 at 8:46 AM Liu Jiayu <ji...@apache.org> wrote:

> thank you for the detailed context - i didn't know about the past
> discussion and i can totally relate to the fact that it's a complex topic.
>
> coming from apache arrow, i've come to know its design choices (date32/64)
> - it's not perfect but i guess worth looking at, given it's also a
> cross-language protocol with careful design on memory layout and processing
> efficiency.
>
> https://arrow.apache.org/docs/format/CDataInterface.html
>
> For date32 it stores in a 4 byte area the number of seconds past UNIX
> epoch, and for date64 it stores in a 8 byte area the number of milliseconds
> past UNIX epoch. To me that sounds like a good standard across all
> languages.
>
> On 2022/06/04 10:39:32 Jens Geyer wrote:
> >
> > jiayuliu wrote:
> >  > Taking a step back, I wonder if we can standardize on a paved path
> >  > for adding newer standalone types in terms of requiredness/optional,
> >  >  plugin mechanism, and/or the level of language support, e.g. if I
> >  > want to add support for date/time/timestamp, following ISO 8601,
> >  > [https://en.wikipedia.org/wiki/ISO_8601,] is that necessarily a
> >  > good idea to be a standalone type?
> >
> > Date and timestamps are a good topic a s well, although also a more
> > complicated one. We discussed that shortly in the past (years ago I
> > might add) and came to thne conclusion that because of the sheer
> > plethora of different systems w/regard what systems expect to be a good
> > date/time format onm the market it is quite hartd to come to one way
> > that satisfies all sides:
> >
> > Main thgings to consider:
> >
> >   * what is the offset? i.e. what is date null?
> >   * what is the precision to store?
> >   * is there a (good) way to handle it available for each language?
> >
> > I'm personally would be thankful if we have some good timestamp data
> > type, but I also see the problems with it.
> >
> > JensG
> >
> > PS: What you mean by "in terms of requiredness/optional"?  How is that
> > related?
> >
> >
>

Re: date/time/timestamp, following ISO 8601

Posted by Liu Jiayu <ji...@apache.org>.
thank you for the detailed context - i didn't know about the past discussion and i can totally relate to the fact that it's a complex topic.

coming from apache arrow, i've come to know its design choices (date32/64) - it's not perfect but i guess worth looking at, given it's also a cross-language protocol with careful design on memory layout and processing efficiency.

https://arrow.apache.org/docs/format/CDataInterface.html

For date32 it stores in a 4 byte area the number of seconds past UNIX epoch, and for date64 it stores in a 8 byte area the number of milliseconds past UNIX epoch. To me that sounds like a good standard across all languages.

On 2022/06/04 10:39:32 Jens Geyer wrote:
> 
> jiayuliu wrote:
>  > Taking a step back, I wonder if we can standardize on a paved path
>  > for adding newer standalone types in terms of requiredness/optional,
>  >  plugin mechanism, and/or the level of language support, e.g. if I
>  > want to add support for date/time/timestamp, following ISO 8601,
>  > [https://en.wikipedia.org/wiki/ISO_8601,] is that necessarily a
>  > good idea to be a standalone type?
> 
> Date and timestamps are a good topic a s well, although also a more 
> complicated one. We discussed that shortly in the past (years ago I 
> might add) and came to thne conclusion that because of the sheer 
> plethora of different systems w/regard what systems expect to be a good 
> date/time format onm the market it is quite hartd to come to one way 
> that satisfies all sides:
> 
> Main thgings to consider:
> 
>   * what is the offset? i.e. what is date null?
>   * what is the precision to store?
>   * is there a (good) way to handle it available for each language?
> 
> I'm personally would be thankful if we have some good timestamp data 
> type, but I also see the problems with it.
> 
> JensG
> 
> PS: What you mean by "in terms of requiredness/optional"?  How is that 
> related?
> 
> 

Re: date/time/timestamp, following ISO 8601

Posted by ul...@gmail.com.
If you can make agreement between both ends of the transport -- lucky 
you -- you're not affected by standardization. It makes no difference 
for you if ISO 8601 is supported by Thrift or not: you have the option 
to go all-custom protocol, date64 nanoseconds whatnot -- and always had 
it before.

Standards start to matter in the other scenario: where you *can't* 
create an agreement between developers, teams, departments (writing in 
languages A and B). Then, forcing a decision adopting a *standard* 
format becomes a mutually acceptable middleground for compromise and 
the only way to make progress.

I hope that provides some context about the primary function of 
standards like ISO 8601 (and that it's social, not technical).

Max

On Mon, Jun 6 2022 at 08:58:43 AM -0700, Yuxuan Wang 
<yu...@reddit.com.INVALID> wrote:
> On Mon, Jun 6, 2022 at 5:36 AM <ul...@gmail.com> wrote:
> 
>>  IMO, supporting even a poor standard is still better than supporting
>>  none at all.
>> 
>>  Datetime handling is indeed a nuanced topic. But consider: how much 
>> of
>>  the complexity should fall onto application code VS how much onto a
>>  framework like Thrift?
>> 
>>  As a cross-language RPC, Thrift should focus on interoperable data
>>  transport. With datetimes, that excludes any sort of processing from
>>  the scope: no timezone conversions, no (re)setting of epoch (null
>>  point), no leap seconds, no resolution rounding, no localization. 
>> In an
>>  RPC framework, all of that is unwanted extra. It belongs to
>>  applications. The job of the framework is transport of data (e.g.
>>  datetimes) from language A to language B.
>> 
> 
> This is exactly why the precision in datetime is such a big issue.
> 
> Since the job of thrift is "to transport of data (timestamp) from A 
> to B",
> it needs to preserve the max precision possible, and leave it to the
> application
> to decide if they actually need to round it to a lower resolution. 
> That's
> the max
> precision possible for a timestamp? That's certainly not milliseconds.
> Currently it's _probably_ nanoseconds, but that can also change in the
> future.
> 
> Which IMHO makes it a poor decision for thrift to make to say 
> "milliseconds
> is the best precision we can provide for datetime". A precision agreed
> between
> both ends of the transport, in a case-by-case basis, is good enough as
> typedef
> an i64 is trivial.
> 
> Back to Jiayu's proposal of date64 provides milliseconds precision and
> date32
> provides day precision: that sounds like a very big surprise to me 
> from the
> name
> of the types. It would be so much better to include the unit in the 
> name of
> the
> types.
> 
> 
>> 
>>  And when it comes to interoperability -- following standards is 
>> king.
>>  Why adding UUID support, doesn't bytearray support suffice?.. Yes it
>>  does not suffice. Noone likes to emulate such a basic feature by
>>  transporting uuids in bytearrays; entailing the complexity cost of
>>  conversion boilerplate (potentially, written out in N languages) and
>>  expanding bug surface.
>> 
>>  Same with datetimes.
>> 
>>  Max @ulidtko
>> 
>>  On Sat, Jun 4 2022 at 12:39:32 PM +0200, Jens Geyer 
>> <je...@apache.org>
>>  wrote:
>>  >
>>  > jiayuliu wrote:
>>  > > Taking a step back, I wonder if we can standardize on a paved 
>> path
>>  > > for adding newer standalone types in terms of 
>> requiredness/optional,
>>  > >  plugin mechanism, and/or the level of language support, e.g. 
>> if I
>>  > > want to add support for date/time/timestamp, following ISO 8601,
>>  > > [https://en.wikipedia.org/wiki/ISO_8601,] is that necessarily a
>>  > > good idea to be a standalone type?
>>  >
>>  > Date and timestamps are a good topic a s well, although also a 
>> more
>>  > complicated one. We discussed that shortly in the past (years ago 
>> I
>>  > might add) and came to thne conclusion that because of the sheer
>>  > plethora of different systems w/regard what systems expect to be a
>>  > good date/time format onm the market it is quite hartd to come to 
>> one
>>  > way that satisfies all sides:
>>  >
>>  > Main thgings to consider:
>>  >
>>  >  * what is the offset? i.e. what is date null?
>>  >  * what is the precision to store?
>>  >  * is there a (good) way to handle it available for each language?
>>  >
>>  > I'm personally would be thankful if we have some good timestamp 
>> data
>>  > type, but I also see the problems with it.
>>  >
>>  > JensG
>>  >
>>  > PS: What you mean by "in terms of requiredness/optional"?  How is
>>  > that related?
>>  >
>> 
>> 
>> 



Re: date/time/timestamp, following ISO 8601

Posted by Yuxuan Wang <yu...@reddit.com.INVALID>.
On Mon, Jun 6, 2022 at 5:36 AM <ul...@gmail.com> wrote:

> IMO, supporting even a poor standard is still better than supporting
> none at all.
>
> Datetime handling is indeed a nuanced topic. But consider: how much of
> the complexity should fall onto application code VS how much onto a
> framework like Thrift?
>
> As a cross-language RPC, Thrift should focus on interoperable data
> transport. With datetimes, that excludes any sort of processing from
> the scope: no timezone conversions, no (re)setting of epoch (null
> point), no leap seconds, no resolution rounding, no localization. In an
> RPC framework, all of that is unwanted extra. It belongs to
> applications. The job of the framework is transport of data (e.g.
> datetimes) from language A to language B.
>

This is exactly why the precision in datetime is such a big issue.

Since the job of thrift is "to transport of data (timestamp) from A to B",
it needs to preserve the max precision possible, and leave it to the
application
to decide if they actually need to round it to a lower resolution. That's
the max
precision possible for a timestamp? That's certainly not milliseconds.
Currently it's _probably_ nanoseconds, but that can also change in the
future.

Which IMHO makes it a poor decision for thrift to make to say "milliseconds
is the best precision we can provide for datetime". A precision agreed
between
both ends of the transport, in a case-by-case basis, is good enough as
typedef
an i64 is trivial.

Back to Jiayu's proposal of date64 provides milliseconds precision and
date32
provides day precision: that sounds like a very big surprise to me from the
name
of the types. It would be so much better to include the unit in the name of
the
types.


>
> And when it comes to interoperability -- following standards is king.
> Why adding UUID support, doesn't bytearray support suffice?.. Yes it
> does not suffice. Noone likes to emulate such a basic feature by
> transporting uuids in bytearrays; entailing the complexity cost of
> conversion boilerplate (potentially, written out in N languages) and
> expanding bug surface.
>
> Same with datetimes.
>
> Max @ulidtko
>
> On Sat, Jun 4 2022 at 12:39:32 PM +0200, Jens Geyer <je...@apache.org>
> wrote:
> >
> > jiayuliu wrote:
> > > Taking a step back, I wonder if we can standardize on a paved path
> > > for adding newer standalone types in terms of requiredness/optional,
> > >  plugin mechanism, and/or the level of language support, e.g. if I
> > > want to add support for date/time/timestamp, following ISO 8601,
> > > [https://en.wikipedia.org/wiki/ISO_8601,] is that necessarily a
> > > good idea to be a standalone type?
> >
> > Date and timestamps are a good topic a s well, although also a more
> > complicated one. We discussed that shortly in the past (years ago I
> > might add) and came to thne conclusion that because of the sheer
> > plethora of different systems w/regard what systems expect to be a
> > good date/time format onm the market it is quite hartd to come to one
> > way that satisfies all sides:
> >
> > Main thgings to consider:
> >
> >  * what is the offset? i.e. what is date null?
> >  * what is the precision to store?
> >  * is there a (good) way to handle it available for each language?
> >
> > I'm personally would be thankful if we have some good timestamp data
> > type, but I also see the problems with it.
> >
> > JensG
> >
> > PS: What you mean by "in terms of requiredness/optional"?  How is
> > that related?
> >
>
>
>

Re: date/time/timestamp, following ISO 8601

Posted by ul...@gmail.com.
IMO, supporting even a poor standard is still better than supporting 
none at all.

Datetime handling is indeed a nuanced topic. But consider: how much of 
the complexity should fall onto application code VS how much onto a 
framework like Thrift?

As a cross-language RPC, Thrift should focus on interoperable data 
transport. With datetimes, that excludes any sort of processing from 
the scope: no timezone conversions, no (re)setting of epoch (null 
point), no leap seconds, no resolution rounding, no localization. In an 
RPC framework, all of that is unwanted extra. It belongs to 
applications. The job of the framework is transport of data (e.g. 
datetimes) from language A to language B.

And when it comes to interoperability -- following standards is king. 
Why adding UUID support, doesn't bytearray support suffice?.. Yes it 
does not suffice. Noone likes to emulate such a basic feature by 
transporting uuids in bytearrays; entailing the complexity cost of 
conversion boilerplate (potentially, written out in N languages) and 
expanding bug surface.

Same with datetimes.

Max @ulidtko

On Sat, Jun 4 2022 at 12:39:32 PM +0200, Jens Geyer <je...@apache.org> 
wrote:
> 
> jiayuliu wrote:
> > Taking a step back, I wonder if we can standardize on a paved path
> > for adding newer standalone types in terms of requiredness/optional,
> >  plugin mechanism, and/or the level of language support, e.g. if I
> > want to add support for date/time/timestamp, following ISO 8601,
> > [https://en.wikipedia.org/wiki/ISO_8601,] is that necessarily a
> > good idea to be a standalone type?
> 
> Date and timestamps are a good topic a s well, although also a more 
> complicated one. We discussed that shortly in the past (years ago I 
> might add) and came to thne conclusion that because of the sheer 
> plethora of different systems w/regard what systems expect to be a 
> good date/time format onm the market it is quite hartd to come to one 
> way that satisfies all sides:
> 
> Main thgings to consider:
> 
>  * what is the offset? i.e. what is date null?
>  * what is the precision to store?
>  * is there a (good) way to handle it available for each language?
> 
> I'm personally would be thankful if we have some good timestamp data 
> type, but I also see the problems with it.
> 
> JensG
> 
> PS: What you mean by "in terms of requiredness/optional"?  How is 
> that related?
>