You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Julien Le Dem <ju...@dremio.com> on 2016/08/11 16:27:33 UTC

finalizing V1 spec

With the goal of finalizing a spec for V1, I've created JIRAs on the format
component.
Here is a list bellow. Please discuss on individual JIRAs if you have
comments.

- [ARROW-253] restrict ints to only power of 2 #bytes (8, 16, 32, 64)
- [ARROW-254] remove Bit as we use Boolean for nullability array (validity
vector)
- [ARROW-252] add implementation guidelines
- [ARROW-255] dictionary encoding spec
- [ARROW-258] need to clarify Buffer.{page,offset} in mem sharing and
RPC/file contexts
- [ARROW-256] add format version
- [ARROW-257] add types vector to union type (to enable using type ids
instead of child offset)

full list:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20component%20%3D%20Format%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC

-- 
Julien

Re: finalizing V1 spec

Posted by Julien Le Dem <ju...@dremio.com>.
The first version of the java implementation of the file format is ready:
https://github.com/apache/arrow/pull/123
It will need some follow up work but I think we should merge it and iterate
on it.
From there it should be easy to validate java/c++ interop.

On Wed, Aug 24, 2016 at 2:55 PM, Wes McKinney <we...@gmail.com> wrote:

> When the dust settles on the current metadata discussions I can scope
> the work / JIRAs for the corresponding C++ implementation -- we will
> be able to reuse much of the IPC code that Micah has been working on.
>
> As a C++ housekeeping item, perhaps we should rename the arrow/ipc
> portion of the codebase to reflect more general data transport
> (IPC/RPC/file-based), e.g. arrow/transport and arrow::transport
> namespace).
>
> On Tue, Aug 23, 2016 at 7:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
> > I'm implementing the Arrow file format (based on RPC messages in a file)
> to
> > validate the metadata on my end.
> >
> > On Sun, Aug 21, 2016 at 9:28 PM, Wes McKinney <we...@gmail.com>
> wrote:
> >
> >> Thank you, Julien, for taking the lead on this.
> >>
> >> It would be great to close out the remaining items with the V1
> >> metadata; I think we are close, so that we can start moving closer to
> >> integration testing between the Java and C++ implementations.
> >>
> >> If anyone else has some time the coming week or two to engage on the
> >> outstanding items (see also the other open pull requests), it would be
> >> most helpful.
> >>
> >> - Wes
> >>
> >> On Thu, Aug 11, 2016 at 9:27 AM, Julien Le Dem <ju...@dremio.com>
> wrote:
> >> > With the goal of finalizing a spec for V1, I've created JIRAs on the
> >> format
> >> > component.
> >> > Here is a list bellow. Please discuss on individual JIRAs if you have
> >> > comments.
> >> >
> >> > - [ARROW-253] restrict ints to only power of 2 #bytes (8, 16, 32, 64)
> >> > - [ARROW-254] remove Bit as we use Boolean for nullability array
> >> (validity
> >> > vector)
> >> > - [ARROW-252] add implementation guidelines
> >> > - [ARROW-255] dictionary encoding spec
> >> > - [ARROW-258] need to clarify Buffer.{page,offset} in mem sharing and
> >> > RPC/file contexts
> >> > - [ARROW-256] add format version
> >> > - [ARROW-257] add types vector to union type (to enable using type ids
> >> > instead of child offset)
> >> >
> >> > full list:
> >> > https://issues.apache.org/jira/issues/?jql=project%20%
> >> 3D%20ARROW%20AND%20component%20%3D%20Format%20AND%20resolution%20%3D%
> >> 20Unresolved%20ORDER%20BY%20due%20ASC%2C%20priority%
> >> 20DESC%2C%20created%20ASC
> >> >
> >> > --
> >> > Julien
> >>
> >
> >
> >
> > --
> > Julien
>



-- 
Julien

Re: finalizing V1 spec

Posted by Wes McKinney <we...@gmail.com>.
When the dust settles on the current metadata discussions I can scope
the work / JIRAs for the corresponding C++ implementation -- we will
be able to reuse much of the IPC code that Micah has been working on.

As a C++ housekeeping item, perhaps we should rename the arrow/ipc
portion of the codebase to reflect more general data transport
(IPC/RPC/file-based), e.g. arrow/transport and arrow::transport
namespace).

On Tue, Aug 23, 2016 at 7:20 PM, Julien Le Dem <ju...@dremio.com> wrote:
> I'm implementing the Arrow file format (based on RPC messages in a file) to
> validate the metadata on my end.
>
> On Sun, Aug 21, 2016 at 9:28 PM, Wes McKinney <we...@gmail.com> wrote:
>
>> Thank you, Julien, for taking the lead on this.
>>
>> It would be great to close out the remaining items with the V1
>> metadata; I think we are close, so that we can start moving closer to
>> integration testing between the Java and C++ implementations.
>>
>> If anyone else has some time the coming week or two to engage on the
>> outstanding items (see also the other open pull requests), it would be
>> most helpful.
>>
>> - Wes
>>
>> On Thu, Aug 11, 2016 at 9:27 AM, Julien Le Dem <ju...@dremio.com> wrote:
>> > With the goal of finalizing a spec for V1, I've created JIRAs on the
>> format
>> > component.
>> > Here is a list bellow. Please discuss on individual JIRAs if you have
>> > comments.
>> >
>> > - [ARROW-253] restrict ints to only power of 2 #bytes (8, 16, 32, 64)
>> > - [ARROW-254] remove Bit as we use Boolean for nullability array
>> (validity
>> > vector)
>> > - [ARROW-252] add implementation guidelines
>> > - [ARROW-255] dictionary encoding spec
>> > - [ARROW-258] need to clarify Buffer.{page,offset} in mem sharing and
>> > RPC/file contexts
>> > - [ARROW-256] add format version
>> > - [ARROW-257] add types vector to union type (to enable using type ids
>> > instead of child offset)
>> >
>> > full list:
>> > https://issues.apache.org/jira/issues/?jql=project%20%
>> 3D%20ARROW%20AND%20component%20%3D%20Format%20AND%20resolution%20%3D%
>> 20Unresolved%20ORDER%20BY%20due%20ASC%2C%20priority%
>> 20DESC%2C%20created%20ASC
>> >
>> > --
>> > Julien
>>
>
>
>
> --
> Julien

Re: finalizing V1 spec

Posted by Julien Le Dem <ju...@dremio.com>.
I'm implementing the Arrow file format (based on RPC messages in a file) to
validate the metadata on my end.

On Sun, Aug 21, 2016 at 9:28 PM, Wes McKinney <we...@gmail.com> wrote:

> Thank you, Julien, for taking the lead on this.
>
> It would be great to close out the remaining items with the V1
> metadata; I think we are close, so that we can start moving closer to
> integration testing between the Java and C++ implementations.
>
> If anyone else has some time the coming week or two to engage on the
> outstanding items (see also the other open pull requests), it would be
> most helpful.
>
> - Wes
>
> On Thu, Aug 11, 2016 at 9:27 AM, Julien Le Dem <ju...@dremio.com> wrote:
> > With the goal of finalizing a spec for V1, I've created JIRAs on the
> format
> > component.
> > Here is a list bellow. Please discuss on individual JIRAs if you have
> > comments.
> >
> > - [ARROW-253] restrict ints to only power of 2 #bytes (8, 16, 32, 64)
> > - [ARROW-254] remove Bit as we use Boolean for nullability array
> (validity
> > vector)
> > - [ARROW-252] add implementation guidelines
> > - [ARROW-255] dictionary encoding spec
> > - [ARROW-258] need to clarify Buffer.{page,offset} in mem sharing and
> > RPC/file contexts
> > - [ARROW-256] add format version
> > - [ARROW-257] add types vector to union type (to enable using type ids
> > instead of child offset)
> >
> > full list:
> > https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20ARROW%20AND%20component%20%3D%20Format%20AND%20resolution%20%3D%
> 20Unresolved%20ORDER%20BY%20due%20ASC%2C%20priority%
> 20DESC%2C%20created%20ASC
> >
> > --
> > Julien
>



-- 
Julien

Re: finalizing V1 spec

Posted by Wes McKinney <we...@gmail.com>.
Thank you, Julien, for taking the lead on this.

It would be great to close out the remaining items with the V1
metadata; I think we are close, so that we can start moving closer to
integration testing between the Java and C++ implementations.

If anyone else has some time the coming week or two to engage on the
outstanding items (see also the other open pull requests), it would be
most helpful.

- Wes

On Thu, Aug 11, 2016 at 9:27 AM, Julien Le Dem <ju...@dremio.com> wrote:
> With the goal of finalizing a spec for V1, I've created JIRAs on the format
> component.
> Here is a list bellow. Please discuss on individual JIRAs if you have
> comments.
>
> - [ARROW-253] restrict ints to only power of 2 #bytes (8, 16, 32, 64)
> - [ARROW-254] remove Bit as we use Boolean for nullability array (validity
> vector)
> - [ARROW-252] add implementation guidelines
> - [ARROW-255] dictionary encoding spec
> - [ARROW-258] need to clarify Buffer.{page,offset} in mem sharing and
> RPC/file contexts
> - [ARROW-256] add format version
> - [ARROW-257] add types vector to union type (to enable using type ids
> instead of child offset)
>
> full list:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20component%20%3D%20Format%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC
>
> --
> Julien