You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Julien Le Dem <ju...@dremio.com> on 2017/05/31 16:16:10 UTC
Arrow sync in 15 min
The arrow sync is at 9:30 am PT today on google hangout
https://hangouts.google.com/hangouts/_/dremio.com/arrow
--
Julien
Re: Arrow sync in 15 min
Posted by Julien Le Dem <ju...@dremio.com>.
Next sync: 6/21 9:30am PT on google hangout
On Wed, May 31, 2017 at 11:06 AM, Julien Le Dem <ju...@dremio.com> wrote:
> Notes:
>
> Attendees/agenda building
> Wes (TwoSigma):
> - Rest API
> - Roadmap
> - communicate with community
> Uwe (Blue Yonder):
> - git tag for versioning
> Julien (Dremio):
> - Timestamp:
> - REST API
> - Roadmap
>
> Discussion:
> - git tag for versioning
> - development packages version names are based on latest tag in
> history from master + commit count since then.
> - since the release tag is in a branch it goes from an older version
> and is misleading
> - options:
> - add a tag {release version}.post on the first commit after the
> release to get a better dev version string
> - rebase master on top of the last release (0.4)
> - we decided to rebase master (the only change is adding the commit
> that updates the version number in pom files)
> - Timestamp in Arrow and Parquet:
> - Both support "Timezone Naive” timestamps (aka “timestamp without
> timezone” in SQL)
> - in Arrow when timezone field is missing in Timestamp type:
> https://github.com/apache/arrow/blob/5899800f53f3c3fffc0db95294c4f0
> eb0e556228/format/Schema.fbs#L117
> - in Parquet (proposed PR) when isAdjustedToUTC is false:
> https://github.com/apache/parquet-format/pull/51/files#diff-
> 0f9d1b5347959e15259da7ba8f4b6252R242
> - They also both support a “Timezone aware” timestamp (aka “timestamp
> with timezone” in SQL)
> - in Arrow when the timezone field is present with the original
> timezone.
> - in Parquet when isAdjustedToUTC is true
> - So there is more information in Arrow and it requires this extra
> information since its absence means “timezone naive”
> - conclusion:
> - when writing to parquet we should use isAdjustedToUTC = false
> only if there is no knowledge of the timezone
> - when reading from parquet we will populate timezone with UTC
> when isAdjustedToUTC == true (and leave it missing otherwise)
> - REST API:
> - review doc here: https://docs.google.com/document/d/1N4TP6zARRs2c4_h-
> 4WqCqIFVPQwmxOmXel1V3AxpGok/edit#
> - Roadmap:
> - todo: blog post to describe the direction of arrow
> - among those:
> - REST API and generalizing messaging
> - C++ analytics library for interacting with ARROW memory. Tools
> for wrapping existing data structure (array of doubles)
> - arrow for GPU
> - Arrow ODBC interface: turbodbc
> - Spark integration improvements: group UDFS etc
>
> On Wed, May 31, 2017 at 9:16 AM, Julien Le Dem <ju...@dremio.com> wrote:
>
>> The arrow sync is at 9:30 am PT today on google hangout
>> https://hangouts.google.com/hangouts/_/dremio.com/arrow
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien
>
--
Julien
Re: Arrow sync in 15 min
Posted by Julien Le Dem <ju...@dremio.com>.
Notes:
Attendees/agenda building
Wes (TwoSigma):
- Rest API
- Roadmap
- communicate with community
Uwe (Blue Yonder):
- git tag for versioning
Julien (Dremio):
- Timestamp:
- REST API
- Roadmap
Discussion:
- git tag for versioning
- development packages version names are based on latest tag in history
from master + commit count since then.
- since the release tag is in a branch it goes from an older version
and is misleading
- options:
- add a tag {release version}.post on the first commit after the
release to get a better dev version string
- rebase master on top of the last release (0.4)
- we decided to rebase master (the only change is adding the commit
that updates the version number in pom files)
- Timestamp in Arrow and Parquet:
- Both support "Timezone Naive” timestamps (aka “timestamp without
timezone” in SQL)
- in Arrow when timezone field is missing in Timestamp type:
https://github.com/apache/arrow/blob/5899800f53f3c3fffc0db95294c4f0eb0e556228/format/Schema.fbs#L117
- in Parquet (proposed PR) when isAdjustedToUTC is false:
https://github.com/apache/parquet-format/pull/51/files#diff-0f9d1b5347959e15259da7ba8f4b6252R242
- They also both support a “Timezone aware” timestamp (aka “timestamp
with timezone” in SQL)
- in Arrow when the timezone field is present with the original
timezone.
- in Parquet when isAdjustedToUTC is true
- So there is more information in Arrow and it requires this extra
information since its absence means “timezone naive”
- conclusion:
- when writing to parquet we should use isAdjustedToUTC = false
only if there is no knowledge of the timezone
- when reading from parquet we will populate timezone with UTC
when isAdjustedToUTC == true (and leave it missing otherwise)
- REST API:
- review doc here:
https://docs.google.com/document/d/1N4TP6zARRs2c4_h-4WqCqIFVPQwmxOmXel1V3AxpGok/edit#
- Roadmap:
- todo: blog post to describe the direction of arrow
- among those:
- REST API and generalizing messaging
- C++ analytics library for interacting with ARROW memory. Tools for
wrapping existing data structure (array of doubles)
- arrow for GPU
- Arrow ODBC interface: turbodbc
- Spark integration improvements: group UDFS etc
On Wed, May 31, 2017 at 9:16 AM, Julien Le Dem <ju...@dremio.com> wrote:
> The arrow sync is at 9:30 am PT today on google hangout
> https://hangouts.google.com/hangouts/_/dremio.com/arrow
>
> --
> Julien
>
--
Julien