You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@parquet.apache.org by Julien Le Dem <ju...@dremio.com> on 2017/02/23 18:01:35 UTC

parquet sync starting now

https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up

-- 
Julien

Re: parquet sync starting now

Posted by Lars Volker <lv...@cloudera.com>.

Thank you Julien for writing up the notes! Here is the Impala JIRA I
mentioned that tracks swapping the fields of TimestampValue: IMPALA-4825
<https://issues.cloudera.org/browse/IMPALA-4825>

A change is out for review here: https://gerrit.cloudera.org/#/c/6048/

Cheers, Lars

On Thu, Feb 23, 2017 at 11:22 AM, Julien Le Dem <ju...@dremio.com> wrote:

>  Attendees/agenda:
> - Nandor, Zoltan (Cloudera/file formats)
> - Lars (Cloudera/Impala)" Statistics progress
> - Uwe (Blue Yonder): Parquet cpp RC. Int96 timestamps
> - Wes (twosigma): parquet cpp rc. 1.0 Release
> - Julien (Dremio): parquet metadata. Statistics.
> - Deepak (HP/Vertica): Parquet-cpp
> - Kazuaki:
> - Ryan was excused :)
>
> Note:
>  - Statistics: https://github.com/apache/parquet-format/pull/46
>    - Impala is waiting for parquet-format to settle on the format to
> finalize their simple mentation.
>    - Action: Julien to follow up with Ryan on the PR
>
>  - Int96 timestamps: https://github.com/apache/parquet-format/pull/49
> (needs Ryan's feedback)
>    - format is nanosecond level timestamp from midnight (64 bits) followed
> by number of days (32 bits)
>    - it sounds like int96 ordering is different from natural byte array
> ordering because days is last in the bytes
>    - discussion about swapping bytes:
>       - format dependent on the boost library used
>       - there could be performance concerns in Impala against changing it
>       - there may be a separate project in impala to swap the bytes for
> kudu compatibility.
>    - discussion about deprecating int96:
>      - need to be able to read them always
>      - not need to define ordering if we have a clear replacement
>      - Need to clarify the requirement for alternative .
>      - int64 could be enough it sounds that nanosecond granularity might
> not be needed.
>    - Julien to create JIRAs:
>      - int96 ordering
>      - int96 deprecation, replacement.
>
> - extra timestamp logical type:
>  - floating timestamp: (not TZ stored. up to the reader to interpret TS
> based on their TZ)
>     - this would be better for following sql standard
>     - Julien to create JIRA
>  - timestamp with timezone (per SQL):
>     - each value has timezone
>     - TZ can be different for each value
>     - Julien to create JIRA
>
>  - parquet-cpp 1.0 release
>    - Uwe to update release script in master.
>    - Uwe to launch a new vote with new RC
>
>  - make impala depend on parquet-cpp
>   - duplication between parquet/impala/kudu
>   - need to measure level of overlap
>   - Wes to open JIRA for this
>   - also need an "apache commons for c++” for SQL type operations:
>      -> could be in arrow
>
>   - metadata improvements.
>    - add page level metadata in footer
>    - page skipping.
>    - Julien to open JIRA.
>
>  - add version of the writer in the footer (more precise than current).
>    - Zoltan to open Jira
>    - possibly add bitfield for bug fixes.
>
>
>
>
>
>
>
>
>
> On Thu, Feb 23, 2017 at 10:01 AM, Julien Le Dem <ju...@dremio.com> wrote:
>
> > https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
> >
> > --
> > Julien
> >
>
>
>
> --
> Julien
>

Re: parquet sync starting now

Posted by Deepak Majeti <ma...@gmail.com>.

I am in favor of the two timestamp type solution as well.
We also have a choice between nanosecond and microsecond/millisecond
precision. Not all tools require nanosecond precision.
I propose the following.

- Add two logical types for nanosecond precision (TIMESTAMP, TIMESTAMP_TZ).
  The underlying physical type will be Fixed Length Byte Array (length 12).
First 4 bytes for the number of days and 8 bytes for nanoseconds elapsed on
that day.

- Add two more logical types for microsecond/millisecond precision
(TIMESTAMP_TZ_MICROS, TIMESTAMP_TZ_MILLIS). TIMESTAMP_MICROS and
TIMESTAMP_MILLIS types already exist.
  The underlying physical type will be INT64. Microseconds/milliseconds
from epoch.

TIMSTAMP_TZ types will be stored in UTC.
The question now comes to storing the TIMESTAMP like a string. One way is
to convert this to UTC as well and also store the writer timezone in the
schema. Since the parquet library will use the same timezone to write and
read timestamp values from UTC, we can get a constant timestamp value.


On Mon, Feb 27, 2017 at 9:00 PM, Greg Rahn <gr...@cloudera.com> wrote:

> I think the decision comes down to how many TIMESTAMP types does Parquet
> (and systems that use it a format) want to support or the use cases that
> are being targeted.
>
> If the answer is two, then it makes sense to follow the ANSI standard and
> what Postgres et al. have done:
> - timestamp [ without time zone ] - never adjust for TZ, treated like a
> timestamp string
> - timestamp with time zone - normalize to UTC based on explicit TZ input
> or implicit TZ from env, results normalized to local/client TZ
>
> If the answer is three, then it makes sense to mimic Oracle’s timestamp
> types despite the differences from ANSI SQL naming/behavior:
> - timestamp [ without time zone ] - never adjust for TZ, treated like a
> timestamp string
> - timestamp with local time zone - normalize to UTC based on explicit TZ
> or implicit TZ from env, results normalized to local/client TZ
> - timestamp with time zone - explicitly store the TZ and never normalize
> or convert, results may contain >1 TZ
>
> It seems to me that the two timestamp type solution is the most popular,
> appears to solve most requirements, and one could explicitly store the TZ
> offset in another column and systems could provide a solution/function to
> convert and return results that contain >1 TZ.  Without normalization and
> storing the TZ explcitly, it forces the application to apply a conversion
> function for any comparisons operations otherwise it becomes logically
> impossible (or ambiguous) to apply date range filters on such a type.  This
> has an obvious performance impact.
>
> My vote would be for the two type solution, however, it is worth
> explicitly noting, that "timestamp with time zone" requires functionality
> beyond the Parquet file format to do the TZ adjustments whether that be
> done server-side on the result sets knowing client-side TZ settings, or in
> the client-side driver code.  Obviously the latter can result in many more
> places for bugs to creep in as every driver implementation needs to do the
> correct TZ adjustment.
>
>
> > On Feb 27, 2017, at 4:42 PM, Marcel Kornacker <ma...@gmail.com> wrote:
> >
> > Greg, thanks for this writeup.
> >
> > Going back to "timestamp with timezone" in Parquet: does anything
> > speak *against* following the SQL standard and storing UTC without an
> > attached timezone (and leaving it to the client to do the conversion
> > correctly for timestamp literals)?
> >
> > On Mon, Feb 27, 2017 at 4:03 PM, Greg Rahn <gr...@cloudera.com> wrote:
> >> As pointed out, there are several different behaviors and type names
> from
> >> the SQL world:
> >>
> >> timestamp without time zone (aka timestamp)
> >>
> >> value not normalized - behaves same as storing the timestamp literal as
> a
> >> string or datetime with fractional seconds
> >> value normalized to UTC based on local environment and adjusted to local
> >> timezone on retrieval
> >>
> >> timestamp with time zone (aka timestamptz)
> >>
> >> value normalized to UTC based on specified time zone offset or value or
> >> falls back to client env when not specified,  adjusted to local
> timezone on
> >> retrieval
> >> value not normalized and time zone value is actually stored based on
> >> specified time zone offset or value or falls back to client env when not
> >> specified, values returned are not normalized to local time zone
> >>
> >> I'm not sure I would 100% agree with your table Zoltan, or the comment
> on
> >> Postgres not following the ANSI standard, as this is a key description
> from
> >> the ANSI standard:
> >>
> >>> A datetime value, of data type TIME WITHOUT TIME ZONE or TIMESTAMP
> WITHOUT
> >>> TIME ZONE, may represent a local time, whereas a datetime value of
> data type
> >>> TIME WITH TIME ZONE or TIMESTAMP WITH TIME ZONE represents UTC.
> >>
> >>
> >> Thus one challenge that exists is that there are two valid behaviors for
> >> TIMESTAMP [WITHOUT TIME ZONE] — conversion from/to local time, or no
> >> adjustment.  The other challenge here is that what some RDBMS systems
> call
> >> the type corresponds to an ANSI type name, however, the behavior
> differs.
> >> This is exactly the case for ANSI's TIMESTAMP WITH TIME ZONE and
> Oracle's
> >> TIMESTAMP WITH TIME ZONE as Oracle's TIMESTAMP WITH LOCAL TIME ZONE
> would
> >> map to ANSI TIMESTAMP WITH TIME ZONE because neither store the TZ info,
> but
> >> both adjust for it.  Oracle's TIMESTAMP WITH TIME ZONE does store the TZ
> >> explicitly but does not adjust.  Oracle notes this difference in naming
> >> between ANSI & Oracle using a disclaimer [1]:
> >>
> >>> This chapter describes Oracle datetime and interval datatypes. It does
> not
> >>> attempt to describe ANSI datatypes or other kinds of datatypes except
> when
> >>> noted.
> >>
> >>
> >> Also I will note some wording clarity -- when the ANSI SQL standard
> uses the
> >> phrase "time zone displacement" this means the data type includes a
> value
> >> for the time zone offset (displacement from UTC), be it explicitly or
> >> implicitly.  It does not mean that "time zone displacement" is actually
> >> stored on disk as this contradicts this ANSI statement:
> >>
> >>> whereas a datetime value of data type TIMESTAMP WITH TIME ZONE
> represents
> >>> UTC.
> >>
> >>
> >> So IMO, all the Postgres-based systems (Vertica, Redshift, Greenplum,
> etc)
> >> implement the ANSI standard for TIMESTAMP WITH TIME ZONE — they
> normalize
> >> input values to UTC and return all values converted to local time zone.
> >> Greenplum explicitly cites their TIMESTAMP implementation is ANSI
> SQL:2008
> >> compliant for feature id F051-03 [2] (which is the ANSI TIMESTAMP type).
> >>
> >> Also see these behavior notes and documentation.
> >>
> >> Vertica -
> >> https://my.vertica.com/docs/8.0.x/HTML/index.htm#Authoring/
> SQLReferenceManual/DataTypes/Date-Time/TIMESTAMP.htm
> >> PostgreSQL -
> >> https://www.postgresql.org/docs/current/static/datatype-
> datetime.html#DATATYPE-TIMEZONES
> >> Redshift -
> >> http://docs.aws.amazon.com/redshift/latest/dg/r_Datetime_
> types.html#r_Datetime_types-timestamptz
> >>
> >>
> >> If I look at all of this and compare it to the Parquet Logical Types [3]
> >> doc, to me the thing that is missing is an unambiguous statement of
> client
> >> behavior for timezone conversion or not.  To my knowledge Apache Drill
> is
> >> the only system that seems to have implemented the Parquet
> TIMESTAMP_MILLIS
> >> type and it looks like they chose to implement it using the normalize
> to UTC
> >> behavior which is not my preference and not what most RDBMS systems do,
> >> including Postgres, Vertica, Redshift, Greenplum, Netezza, and Oracle
> do for
> >> TIMESTAMP [WITHOUT TIME ZONE].
> >>
> >> For example, changing the local system timezone setting does not change
> the
> >> results of the query for a TIMESTAMP [WITHOUT TIME ZONE] type:
> >>
> >> create table ts1 (t timestamp without time zone);
> >> insert into ts1 values (timestamp '2009-05-12 12:00:00');
> >> insert into ts1 values (timestamp '2009-05-13 12:00:00');
> >> insert into ts1 values (timestamp '2009-05-14 12:00:00');
> >>
> >> show timezone;
> >>   name   |  setting
> >> ----------+------------
> >> timezone | US/Eastern
> >>
> >> select * from ts1;
> >>          t
> >> ---------------------
> >> 2009-05-12 12:00:00
> >> 2009-05-13 12:00:00
> >> 2009-05-14 12:00:00
> >>
> >> set timezone US/Pacific;
> >> show timezone;
> >>
> >>   name   |  setting
> >> ----------+------------
> >> timezone | US/Pacific
> >>
> >> select * from ts1;
> >>          t
> >> ---------------------
> >> 2009-05-12 12:00:00
> >> 2009-05-13 12:00:00
> >> 2009-05-14 12:00:00
> >>
> >>
> >> BTW, Presto has come to realize their implementation is different than
> the
> >> ANSI SQL standard as well.  This GitHub issue [4] has a good writeup of
> >> things.
> >>
> >> References
> >> [1]
> >> http://docs.oracle.com/database/122/NLSPG/datetime-
> data-types-and-time-zone-support.htm#NLSPG237
> >> [2] https://gpdb.docs.pivotal.io/4320/ref_guide/SQL2008_
> support.html#topic1
> >> [3] https://github.com/apache/parquet-format/blob/master/
> LogicalTypes.md
> >> [4] https://github.com/prestodb/presto/issues/7122
> >>
> >>
> >>
> >>
> >> On Mon, Feb 27, 2017 at 10:45 AM Marcel Kornacker <ma...@gmail.com>
> wrote:
> >>>
> >>> On Mon, Feb 27, 2017 at 10:43 AM, Zoltan Ivanfi <zi...@cloudera.com>
> wrote:
> >>>> What you describe (storing in UTC and adjusting to local time) is the
> >>>> implicit timezone that is associated with the plain TIMEZONE type of
> >>>> ANSI
> >>>> SQL. Excerpts:
> >>>
> >>> Postgres allows explicit timezone offsets in timestamp literals. When
> >>> these are present they override the implicit local timezone.
> >>
> >>
> >>
> >>
> >>>
> >>>  Datetime data types that contain time fields (TIME and TIMESTAMP) are
> >>> maintained
> >>>  in Universal Coordinated Time (UTC), with an explicit or implied time
> >>> zone
> >>> part.
> >>>
> >>>  A TIME or TIMESTAMP that does not specify WITH TIME ZONE has an im-
> >>>  plicit time zone equal to the local time zone for the SQL-session.
> >>>  The value of time represented in the data changes along with the
> >>>  local time zone for the SQL-session. However, the meaning of the
> >>>  time does not change because it is effectively maintained in UTC.
> >>>
> >>> Zoltan
> >>>
> >>> On Mon, Feb 27, 2017 at 7:34 PM Marcel Kornacker <ma...@gmail.com>
> >>> wrote:
> >>>>
> >>>> On Mon, Feb 27, 2017 at 8:47 AM, Zoltan Ivanfi <zi...@cloudera.com>
> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> Although the draft of SQL-92[1] does not explicitly state that the
> time
> >>>>> zone
> >>>>> offset has to be stored, the following excerpts strongly suggest that
> >>>>> the
> >>>>> time zone has to be stored with each individual value of TIMESTAMP
> WITH
> >>>>> TIME
> >>>>> ZONE:
> >>>>>
> >>>>>  The length of a TIMESTAMP is 19 positions [...]
> >>>>>  The length of a TIMESTAMP WITH TIME ZONE is 25 positions [...]
> >>>>>
> >>>>> The draft of SQL 2003[2] standard is more specific:
> >>>>>
> >>>>>  TIMESTAMP and TIME may also be specified as being WITH TIME ZONE, in
> >>>>> which
> >>>>> case every value has associated with it a time zone displacement.
> >>>>
> >>>> It is not clear to me whether that means that the timezone is *stored*
> >>>> with the value. In Postgres, a timestamp still has an associated
> >>>> timezone on input and output (but it only stores the timezone
> >>>> normalized to UTC).
> >>>>
> >>>>>
> >>>>> However, the TIMESTAMP WITH TIME ZONE type in PostgreSQL does not
> >>>>> conform to
> >>>>> this definition, but behaves like the plain TIMESTAMP type of the SQL
> >>>>> specification. Oracle calls this type TIMESTAMP WITH LOCAL TIME
> >>>>> instead.
> >>>>>
> >>>>> It's easier to get an overview on this chaos if we list the different
> >>>>> interpretations and data types:
> >>>>>
> >>>>> Timestamp stored with a time zone:
> >>>>> - ANSI SQL type: TIMESTAMP WITH TIME ZONE
> >>>>> - Oracle type: TIMESTAMP WITH TIME ZONE
> >>>>> - PostgeSQL type: -
> >>>>>
> >>>>> Timestamp stored without a time zone that represents a time in UTC
> >>>>> (automatically converted to/from local time):
> >>>>> - ANSI SQL type: TIMESTAMP
> >>>>> - Oracle type: TIMESTAMP WITH LOCAL TIME
> >>>>> - PostgeSQL type: TIMESTAMP WITH TIME ZONE
> >>>>> - Parquet type: TIMESTAMP_MILLIS
> >>>>>
> >>>>> Timestamp stored without a time zone that represents floating time
> (has
> >>>>> the
> >>>>> same apparent value regardless of time zone, does not refer to an
> >>>>> absolute
> >>>>> time instant):
> >>>>> - ANSI SQL type: -
> >>>>> - Oracle type: TIMESTAMP
> >>>>> - PostgeSQL type: TIMESTAMP
> >>>>> - Impala type: TIMESTAMP (stored as INT96 in Parquet)
> >>>>>
> >>>>> [1] http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
> >>>>> [2] http://www.wiscorp.com/sql_2003_standard.zip
> >>>>>
> >>>>> Zoltan
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Thu, Feb 23, 2017 at 10:46 PM Marcel Kornacker <marcelk@gmail.com
> >
> >>>>> wrote:
> >>>>>>
> >>>>>> Regarding timestamp with timezone: I'm not sure whether the SQL
> >>>>>> standard requires the timezone to be stored along with the timestamp
> >>>>>> for 'timestamp with timezone' (at least Oracle and Postgres diverge
> on
> >>>>>> that topic).
> >>>>>>
> >>>>>> Cc'ing Greg Rahn to shed some more light on that.
> >>>>>>
> >>>>>> Regarding 'make Impala depend on parquet-cpp': could someone expand
> on
> >>>>>> why we want to do this? There probably is overlap between
> >>>>>> Impala/Kudu/parquet-cpp, but the runtime systems of the first two
> have
> >>>>>> specific requirements (and are also different from each other), so
> >>>>>> trying to unify this into parquet-cpp seems difficult.
> >>>>>>
> >>>>>> On Thu, Feb 23, 2017 at 11:22 AM, Julien Le Dem <ju...@dremio.com>
> >>>>>> wrote:
> >>>>>>> Attendees/agenda:
> >>>>>>> - Nandor, Zoltan (Cloudera/file formats)
> >>>>>>> - Lars (Cloudera/Impala)" Statistics progress
> >>>>>>> - Uwe (Blue Yonder): Parquet cpp RC. Int96 timestamps
> >>>>>>> - Wes (twosigma): parquet cpp rc. 1.0 Release
> >>>>>>> - Julien (Dremio): parquet metadata. Statistics.
> >>>>>>> - Deepak (HP/Vertica): Parquet-cpp
> >>>>>>> - Kazuaki:
> >>>>>>> - Ryan was excused :)
> >>>>>>>
> >>>>>>> Note:
> >>>>>>> - Statistics: https://github.com/apache/parquet-format/pull/46
> >>>>>>>   - Impala is waiting for parquet-format to settle on the format to
> >>>>>>> finalize their simple mentation.
> >>>>>>>   - Action: Julien to follow up with Ryan on the PR
> >>>>>>>
> >>>>>>> - Int96 timestamps:
> >>>>>>> https://github.com/apache/parquet-format/pull/49
> >>>>>>> (needs Ryan's feedback)
> >>>>>>>   - format is nanosecond level timestamp from midnight (64 bits)
> >>>>>>> followed
> >>>>>>> by number of days (32 bits)
> >>>>>>>   - it sounds like int96 ordering is different from natural byte
> >>>>>>> array
> >>>>>>> ordering because days is last in the bytes
> >>>>>>>   - discussion about swapping bytes:
> >>>>>>>      - format dependent on the boost library used
> >>>>>>>      - there could be performance concerns in Impala against
> >>>>>>> changing
> >>>>>>> it
> >>>>>>>      - there may be a separate project in impala to swap the bytes
> >>>>>>> for
> >>>>>>> kudu compatibility.
> >>>>>>>   - discussion about deprecating int96:
> >>>>>>>     - need to be able to read them always
> >>>>>>>     - not need to define ordering if we have a clear replacement
> >>>>>>>     - Need to clarify the requirement for alternative .
> >>>>>>>     - int64 could be enough it sounds that nanosecond granularity
> >>>>>>> might
> >>>>>>> not be needed.
> >>>>>>>   - Julien to create JIRAs:
> >>>>>>>     - int96 ordering
> >>>>>>>     - int96 deprecation, replacement.
> >>>>>>>
> >>>>>>> - extra timestamp logical type:
> >>>>>>> - floating timestamp: (not TZ stored. up to the reader to interpret
> >>>>>>> TS
> >>>>>>> based on their TZ)
> >>>>>>>    - this would be better for following sql standard
> >>>>>>>    - Julien to create JIRA
> >>>>>>> - timestamp with timezone (per SQL):
> >>>>>>>    - each value has timezone
> >>>>>>>    - TZ can be different for each value
> >>>>>>>    - Julien to create JIRA
> >>>>>>>
> >>>>>>> - parquet-cpp 1.0 release
> >>>>>>>   - Uwe to update release script in master.
> >>>>>>>   - Uwe to launch a new vote with new RC
> >>>>>>>
> >>>>>>> - make impala depend on parquet-cpp
> >>>>>>>  - duplication between parquet/impala/kudu
> >>>>>>>  - need to measure level of overlap
> >>>>>>>  - Wes to open JIRA for this
> >>>>>>>  - also need an "apache commons for c++” for SQL type operations:
> >>>>>>>     -> could be in arrow
> >>>>>>>
> >>>>>>>  - metadata improvements.
> >>>>>>>   - add page level metadata in footer
> >>>>>>>   - page skipping.
> >>>>>>>   - Julien to open JIRA.
> >>>>>>>
> >>>>>>> - add version of the writer in the footer (more precise than
> >>>>>>> current).
> >>>>>>>   - Zoltan to open Jira
> >>>>>>>   - possibly add bitfield for bug fixes.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, Feb 23, 2017 at 10:01 AM, Julien Le Dem <julien@dremio.com
> >
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Julien
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Julien
>
>


-- 
regards,
Deepak Majeti