You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Julien Le Dem <ju...@dremio.com> on 2017/05/31 16:16:10 UTC

Arrow sync in 15 min

The arrow sync is at 9:30 am PT today on google hangout
https://hangouts.google.com/hangouts/_/dremio.com/arrow

-- 
Julien

Re: Arrow sync in 15 min

Posted by Julien Le Dem <ju...@dremio.com>.
Next sync: 6/21 9:30am PT on google hangout


On Wed, May 31, 2017 at 11:06 AM, Julien Le Dem <ju...@dremio.com> wrote:

> Notes:
>
> Attendees/agenda building
> Wes (TwoSigma):
>  - Rest API
>  - Roadmap
>  - communicate with community
> Uwe (Blue Yonder):
>  - git tag for versioning
> Julien (Dremio):
>  - Timestamp:
>  - REST API
>  - Roadmap
>
> Discussion:
>  - git tag for versioning
>     - development packages version names are based on latest tag in
> history from master + commit count since then.
>     - since the release tag is in a branch it goes from an older version
> and is misleading
>     - options:
>        - add a tag {release version}.post on the first commit after the
> release to get a better dev version string
>        - rebase master on top of the last release (0.4)
>     - we decided to rebase master (the only change is adding the commit
> that updates the version number in pom files)
>  - Timestamp in Arrow and Parquet:
>     - Both support "Timezone Naive” timestamps (aka “timestamp without
> timezone” in SQL)
>         - in Arrow when timezone field is missing in Timestamp type:
> https://github.com/apache/arrow/blob/5899800f53f3c3fffc0db95294c4f0
> eb0e556228/format/Schema.fbs#L117
>         - in Parquet (proposed PR) when isAdjustedToUTC is false:
> https://github.com/apache/parquet-format/pull/51/files#diff-
> 0f9d1b5347959e15259da7ba8f4b6252R242
>     - They also both support a “Timezone aware” timestamp (aka “timestamp
> with timezone” in SQL)
>         - in Arrow when the timezone field is present with the original
> timezone.
>         - in Parquet when isAdjustedToUTC is true
>     - So there is more information in Arrow and it requires this extra
> information since its absence means “timezone naive”
>     - conclusion:
>         - when writing to parquet we should use isAdjustedToUTC = false
> only if there is no knowledge of the timezone
>         - when reading from parquet we will populate timezone with UTC
> when isAdjustedToUTC == true (and leave it missing otherwise)
>  - REST API:
>    - review doc here: https://docs.google.com/document/d/1N4TP6zARRs2c4_h-
> 4WqCqIFVPQwmxOmXel1V3AxpGok/edit#
>  - Roadmap:
>     - todo: blog post to describe the direction of arrow
>     - among those:
>       - REST API and generalizing messaging
>        - C++ analytics library for interacting with ARROW memory. Tools
> for wrapping existing data structure (array of doubles)
>        - arrow for GPU
>        - Arrow ODBC interface: turbodbc
>        - Spark integration improvements: group UDFS etc
>
> On Wed, May 31, 2017 at 9:16 AM, Julien Le Dem <ju...@dremio.com> wrote:
>
>> The arrow sync is at 9:30 am PT today on google hangout
>> https://hangouts.google.com/hangouts/_/dremio.com/arrow
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien
>



-- 
Julien

Re: Arrow sync in 15 min

Posted by Julien Le Dem <ju...@dremio.com>.
 Notes:

Attendees/agenda building
Wes (TwoSigma):
 - Rest API
 - Roadmap
 - communicate with community
Uwe (Blue Yonder):
 - git tag for versioning
Julien (Dremio):
 - Timestamp:
 - REST API
 - Roadmap

Discussion:
 - git tag for versioning
    - development packages version names are based on latest tag in history
from master + commit count since then.
    - since the release tag is in a branch it goes from an older version
and is misleading
    - options:
       - add a tag {release version}.post on the first commit after the
release to get a better dev version string
       - rebase master on top of the last release (0.4)
    - we decided to rebase master (the only change is adding the commit
that updates the version number in pom files)
 - Timestamp in Arrow and Parquet:
    - Both support "Timezone Naive” timestamps (aka “timestamp without
timezone” in SQL)
        - in Arrow when timezone field is missing in Timestamp type:
https://github.com/apache/arrow/blob/5899800f53f3c3fffc0db95294c4f0eb0e556228/format/Schema.fbs#L117
        - in Parquet (proposed PR) when isAdjustedToUTC is false:
https://github.com/apache/parquet-format/pull/51/files#diff-0f9d1b5347959e15259da7ba8f4b6252R242
    - They also both support a “Timezone aware” timestamp (aka “timestamp
with timezone” in SQL)
        - in Arrow when the timezone field is present with the original
timezone.
        - in Parquet when isAdjustedToUTC is true
    - So there is more information in Arrow and it requires this extra
information since its absence means “timezone naive”
    - conclusion:
        - when writing to parquet we should use isAdjustedToUTC = false
only if there is no knowledge of the timezone
        - when reading from parquet we will populate timezone with UTC
when isAdjustedToUTC == true (and leave it missing otherwise)
 - REST API:
   - review doc here:
https://docs.google.com/document/d/1N4TP6zARRs2c4_h-4WqCqIFVPQwmxOmXel1V3AxpGok/edit#
 - Roadmap:
    - todo: blog post to describe the direction of arrow
    - among those:
      - REST API and generalizing messaging
       - C++ analytics library for interacting with ARROW memory. Tools for
wrapping existing data structure (array of doubles)
       - arrow for GPU
       - Arrow ODBC interface: turbodbc
       - Spark integration improvements: group UDFS etc

On Wed, May 31, 2017 at 9:16 AM, Julien Le Dem <ju...@dremio.com> wrote:

> The arrow sync is at 9:30 am PT today on google hangout
> https://hangouts.google.com/hangouts/_/dremio.com/arrow
>
> --
> Julien
>



-- 
Julien