You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@parquet.apache.org by Julien Le Dem <ju...@dremio.com> on 2016/09/06 23:52:07 UTC

Arrow-Parquet integration location (Was: Arrow cpp travis-ci build broken)

Thanks Wes,
No worries, I know you are on top of those things.
On a side note, I was wondering if the arrow-parquet integration should be
in Parquet instead.
Parquet would depend on Arrow and not the other way around.
Arrow provides the API and each storage layer (Parquet, Kudu, Cassandra,
...) provides a way to produce Arrow Record Batches.
thoughts?

On Tue, Sep 6, 2016 at 3:37 PM, Wes McKinney <we...@gmail.com> wrote:

> hi Julien,
>
> I'm very sorry about the inconvenience with this and the delay in
> getting it sorted out. I will triage this evening by disabling the
> Parquet tests in Arrow until we get the current problems under
> control. When we re-enable the Parquet tests in Travis CI I agree we
> should pin the version SHA.
>
> - Wes
>
> On Tue, Sep 6, 2016 at 5:30 PM, Julien Le Dem <ju...@dremio.com> wrote:
> > The Arrow cpp travis-ci build is broken right now because it depends on
> > parquet-cpp which has changed in an incompatible way. [1] [2] (or so it
> > looks to me)
> > Since parquet-cpp is not released yet it is totally fine to make
> > incompatible API changes.
> > However, we may want to pin the Arrow to Parquet dependency (on a git
> sha?)
> > to prevent cross project changes from breaking the master build.
> > Since I'm not one of the core cpp dev on those projects I mainly want to
> > start that conversation rather than prescribe a solution. Feel free to
> take
> > this as a straw man and suggest something else.
> >
> > [1] https://travis-ci.org/apache/arrow/jobs/156080555
> > [2]
> > https://github.com/apache/arrow/blob/2d8ec789365f3c0f82b1f22d76160d
> 5af150dd31/ci/travis_before_script_cpp.sh
> >
> >
> > --
> > Julien
>



-- 
Julien

Re: Arrow-Parquet integration location (Was: Arrow cpp travis-ci build broken)

Posted by Julien Le Dem <ju...@dremio.com>.

@Wes, Uwe: Thank you!

@Brian: no procedure required :) Thanks for your feedback.
We're happy to hear more about SAS integration. Feel free to send a blurb
to the list.

On Tue, Sep 6, 2016 at 9:51 PM, Uwe Korn <uw...@xhochy.com> wrote:

> Hello,
>
> I'm also in favour of switching the dependency direction between Parquet
> and Arrow as this would avoid a lot of duplicate code in both projects as
> well as parquet-cpp profiting from functionality that is available in Arrow.
>
> @wesm: go ahead with the JIRAs and I'll add comments or will pick some of
> them up.
>
> Cheers
>
> Uwe
>
>
>
> On 07.09.16 04:41, Wes McKinney wrote:
>
>> hi Julien,
>>
>> It makes sense to move the Parquet support for Arrow into Parquet
>> itself and invert the dependency. I had thought that the coupling to
>> Arrow C++'s IO subsystem might be tighter, but the connection between
>> memory allocators and file abstractions is fairly simple:
>>
>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/parquet/io.h
>>
>> I'll open appropriate JIRAs and Uwe and I can coordinate on the
>> refactoring.
>>
>> The exposure of the Parquet functionality in Python should stay inside
>> Arrow for now, but mainly because it would make developing the Python
>> side of things much more difficult if we split things up right now.
>>
>> - Wes
>>
>> On Tue, Sep 6, 2016 at 8:27 PM, Brian Bowman <Br...@sas.com>
>> wrote:
>>
>>> Forgive me if interposing my first post for the Apache Arrow project on
>>> this thread is incorrect procedure.
>>>
>>> What Julien proposes with each storage layer producing Arrow Record
>>> Batches is exactly how I envision it working and would certainly make Arrow
>>> integration with SAS much more palatable.  This is likely true for other
>>> storage layer providers as well.
>>>
>>> Brian Bowman (SAS)
>>>
>>> On Sep 6, 2016, at 7:52 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>>>
>>>> Thanks Wes,
>>>> No worries, I know you are on top of those things.
>>>> On a side note, I was wondering if the arrow-parquet integration should
>>>> be
>>>> in Parquet instead.
>>>> Parquet would depend on Arrow and not the other way around.
>>>> Arrow provides the API and each storage layer (Parquet, Kudu, Cassandra,
>>>> ...) provides a way to produce Arrow Record Batches.
>>>> thoughts?
>>>>
>>>> On Tue, Sep 6, 2016 at 3:37 PM, Wes McKinney <we...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> hi Julien,
>>>>>
>>>>> I'm very sorry about the inconvenience with this and the delay in
>>>>> getting it sorted out. I will triage this evening by disabling the
>>>>> Parquet tests in Arrow until we get the current problems under
>>>>> control. When we re-enable the Parquet tests in Travis CI I agree we
>>>>> should pin the version SHA.
>>>>>
>>>>> - Wes
>>>>>
>>>>> On Tue, Sep 6, 2016 at 5:30 PM, Julien Le Dem <ju...@dremio.com>
>>>>>> wrote:
>>>>>> The Arrow cpp travis-ci build is broken right now because it depends
>>>>>> on
>>>>>> parquet-cpp which has changed in an incompatible way. [1] [2] (or so
>>>>>> it
>>>>>> looks to me)
>>>>>> Since parquet-cpp is not released yet it is totally fine to make
>>>>>> incompatible API changes.
>>>>>> However, we may want to pin the Arrow to Parquet dependency (on a git
>>>>>>
>>>>> sha?)
>>>>>
>>>>>> to prevent cross project changes from breaking the master build.
>>>>>> Since I'm not one of the core cpp dev on those projects I mainly want
>>>>>> to
>>>>>> start that conversation rather than prescribe a solution. Feel free to
>>>>>>
>>>>> take
>>>>>
>>>>>> this as a straw man and suggest something else.
>>>>>>
>>>>>> [1] https://travis-ci.org/apache/arrow/jobs/156080555
>>>>>> [2]
>>>>>> https://github.com/apache/arrow/blob/2d8ec789365f3c0f82b1f22d76160d
>>>>>>
>>>>> 5af150dd31/ci/travis_before_script_cpp.sh
>>>>>
>>>>>>
>>>>>> --
>>>>>> Julien
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Julien
>>>>
>>>
>


-- 
Julien

Re: Arrow-Parquet integration location (Was: Arrow cpp travis-ci build broken)

Posted by Julien Le Dem <ju...@dremio.com>.

@Wes, Uwe: Thank you!

@Brian: no procedure required :) Thanks for your feedback.
We're happy to hear more about SAS integration. Feel free to send a blurb
to the list.

On Tue, Sep 6, 2016 at 9:51 PM, Uwe Korn <uw...@xhochy.com> wrote:

> Hello,
>
> I'm also in favour of switching the dependency direction between Parquet
> and Arrow as this would avoid a lot of duplicate code in both projects as
> well as parquet-cpp profiting from functionality that is available in Arrow.
>
> @wesm: go ahead with the JIRAs and I'll add comments or will pick some of
> them up.
>
> Cheers
>
> Uwe
>
>
>
> On 07.09.16 04:41, Wes McKinney wrote:
>
>> hi Julien,
>>
>> It makes sense to move the Parquet support for Arrow into Parquet
>> itself and invert the dependency. I had thought that the coupling to
>> Arrow C++'s IO subsystem might be tighter, but the connection between
>> memory allocators and file abstractions is fairly simple:
>>
>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/parquet/io.h
>>
>> I'll open appropriate JIRAs and Uwe and I can coordinate on the
>> refactoring.
>>
>> The exposure of the Parquet functionality in Python should stay inside
>> Arrow for now, but mainly because it would make developing the Python
>> side of things much more difficult if we split things up right now.
>>
>> - Wes
>>
>> On Tue, Sep 6, 2016 at 8:27 PM, Brian Bowman <Br...@sas.com>
>> wrote:
>>
>>> Forgive me if interposing my first post for the Apache Arrow project on
>>> this thread is incorrect procedure.
>>>
>>> What Julien proposes with each storage layer producing Arrow Record
>>> Batches is exactly how I envision it working and would certainly make Arrow
>>> integration with SAS much more palatable.  This is likely true for other
>>> storage layer providers as well.
>>>
>>> Brian Bowman (SAS)
>>>
>>> On Sep 6, 2016, at 7:52 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>>>
>>>> Thanks Wes,
>>>> No worries, I know you are on top of those things.
>>>> On a side note, I was wondering if the arrow-parquet integration should
>>>> be
>>>> in Parquet instead.
>>>> Parquet would depend on Arrow and not the other way around.
>>>> Arrow provides the API and each storage layer (Parquet, Kudu, Cassandra,
>>>> ...) provides a way to produce Arrow Record Batches.
>>>> thoughts?
>>>>
>>>> On Tue, Sep 6, 2016 at 3:37 PM, Wes McKinney <we...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> hi Julien,
>>>>>
>>>>> I'm very sorry about the inconvenience with this and the delay in
>>>>> getting it sorted out. I will triage this evening by disabling the
>>>>> Parquet tests in Arrow until we get the current problems under
>>>>> control. When we re-enable the Parquet tests in Travis CI I agree we
>>>>> should pin the version SHA.
>>>>>
>>>>> - Wes
>>>>>
>>>>> On Tue, Sep 6, 2016 at 5:30 PM, Julien Le Dem <ju...@dremio.com>
>>>>>> wrote:
>>>>>> The Arrow cpp travis-ci build is broken right now because it depends
>>>>>> on
>>>>>> parquet-cpp which has changed in an incompatible way. [1] [2] (or so
>>>>>> it
>>>>>> looks to me)
>>>>>> Since parquet-cpp is not released yet it is totally fine to make
>>>>>> incompatible API changes.
>>>>>> However, we may want to pin the Arrow to Parquet dependency (on a git
>>>>>>
>>>>> sha?)
>>>>>
>>>>>> to prevent cross project changes from breaking the master build.
>>>>>> Since I'm not one of the core cpp dev on those projects I mainly want
>>>>>> to
>>>>>> start that conversation rather than prescribe a solution. Feel free to
>>>>>>
>>>>> take
>>>>>
>>>>>> this as a straw man and suggest something else.
>>>>>>
>>>>>> [1] https://travis-ci.org/apache/arrow/jobs/156080555
>>>>>> [2]
>>>>>> https://github.com/apache/arrow/blob/2d8ec789365f3c0f82b1f22d76160d
>>>>>>
>>>>> 5af150dd31/ci/travis_before_script_cpp.sh
>>>>>
>>>>>>
>>>>>> --
>>>>>> Julien
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Julien
>>>>
>>>
>


-- 
Julien

Re: Arrow-Parquet integration location (Was: Arrow cpp travis-ci build broken)

Posted by Uwe Korn <uw...@xhochy.com>.

Hello,

I'm also in favour of switching the dependency direction between Parquet 
and Arrow as this would avoid a lot of duplicate code in both projects 
as well as parquet-cpp profiting from functionality that is available in 
Arrow.

@wesm: go ahead with the JIRAs and I'll add comments or will pick some 
of them up.

Cheers

Uwe


On 07.09.16 04:41, Wes McKinney wrote:
> hi Julien,
>
> It makes sense to move the Parquet support for Arrow into Parquet
> itself and invert the dependency. I had thought that the coupling to
> Arrow C++'s IO subsystem might be tighter, but the connection between
> memory allocators and file abstractions is fairly simple:
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/parquet/io.h
>
> I'll open appropriate JIRAs and Uwe and I can coordinate on the refactoring.
>
> The exposure of the Parquet functionality in Python should stay inside
> Arrow for now, but mainly because it would make developing the Python
> side of things much more difficult if we split things up right now.
>
> - Wes
>
> On Tue, Sep 6, 2016 at 8:27 PM, Brian Bowman <Br...@sas.com> wrote:
>> Forgive me if interposing my first post for the Apache Arrow project on this thread is incorrect procedure.
>>
>> What Julien proposes with each storage layer producing Arrow Record Batches is exactly how I envision it working and would certainly make Arrow integration with SAS much more palatable.  This is likely true for other storage layer providers as well.
>>
>> Brian Bowman (SAS)
>>
>>> On Sep 6, 2016, at 7:52 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>>
>>> Thanks Wes,
>>> No worries, I know you are on top of those things.
>>> On a side note, I was wondering if the arrow-parquet integration should be
>>> in Parquet instead.
>>> Parquet would depend on Arrow and not the other way around.
>>> Arrow provides the API and each storage layer (Parquet, Kudu, Cassandra,
>>> ...) provides a way to produce Arrow Record Batches.
>>> thoughts?
>>>
>>>> On Tue, Sep 6, 2016 at 3:37 PM, Wes McKinney <we...@gmail.com> wrote:
>>>>
>>>> hi Julien,
>>>>
>>>> I'm very sorry about the inconvenience with this and the delay in
>>>> getting it sorted out. I will triage this evening by disabling the
>>>> Parquet tests in Arrow until we get the current problems under
>>>> control. When we re-enable the Parquet tests in Travis CI I agree we
>>>> should pin the version SHA.
>>>>
>>>> - Wes
>>>>
>>>>> On Tue, Sep 6, 2016 at 5:30 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>>>> The Arrow cpp travis-ci build is broken right now because it depends on
>>>>> parquet-cpp which has changed in an incompatible way. [1] [2] (or so it
>>>>> looks to me)
>>>>> Since parquet-cpp is not released yet it is totally fine to make
>>>>> incompatible API changes.
>>>>> However, we may want to pin the Arrow to Parquet dependency (on a git
>>>> sha?)
>>>>> to prevent cross project changes from breaking the master build.
>>>>> Since I'm not one of the core cpp dev on those projects I mainly want to
>>>>> start that conversation rather than prescribe a solution. Feel free to
>>>> take
>>>>> this as a straw man and suggest something else.
>>>>>
>>>>> [1] https://travis-ci.org/apache/arrow/jobs/156080555
>>>>> [2]
>>>>> https://github.com/apache/arrow/blob/2d8ec789365f3c0f82b1f22d76160d
>>>> 5af150dd31/ci/travis_before_script_cpp.sh
>>>>>
>>>>> --
>>>>> Julien
>>>
>>>
>>> --
>>> Julien

Re: Arrow-Parquet integration location (Was: Arrow cpp travis-ci build broken)

Posted by Uwe Korn <uw...@xhochy.com>.

Hello,

I'm also in favour of switching the dependency direction between Parquet 
and Arrow as this would avoid a lot of duplicate code in both projects 
as well as parquet-cpp profiting from functionality that is available in 
Arrow.

@wesm: go ahead with the JIRAs and I'll add comments or will pick some 
of them up.

Cheers

Uwe


On 07.09.16 04:41, Wes McKinney wrote:
> hi Julien,
>
> It makes sense to move the Parquet support for Arrow into Parquet
> itself and invert the dependency. I had thought that the coupling to
> Arrow C++'s IO subsystem might be tighter, but the connection between
> memory allocators and file abstractions is fairly simple:
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/parquet/io.h
>
> I'll open appropriate JIRAs and Uwe and I can coordinate on the refactoring.
>
> The exposure of the Parquet functionality in Python should stay inside
> Arrow for now, but mainly because it would make developing the Python
> side of things much more difficult if we split things up right now.
>
> - Wes
>
> On Tue, Sep 6, 2016 at 8:27 PM, Brian Bowman <Br...@sas.com> wrote:
>> Forgive me if interposing my first post for the Apache Arrow project on this thread is incorrect procedure.
>>
>> What Julien proposes with each storage layer producing Arrow Record Batches is exactly how I envision it working and would certainly make Arrow integration with SAS much more palatable.  This is likely true for other storage layer providers as well.
>>
>> Brian Bowman (SAS)
>>
>>> On Sep 6, 2016, at 7:52 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>>
>>> Thanks Wes,
>>> No worries, I know you are on top of those things.
>>> On a side note, I was wondering if the arrow-parquet integration should be
>>> in Parquet instead.
>>> Parquet would depend on Arrow and not the other way around.
>>> Arrow provides the API and each storage layer (Parquet, Kudu, Cassandra,
>>> ...) provides a way to produce Arrow Record Batches.
>>> thoughts?
>>>
>>>> On Tue, Sep 6, 2016 at 3:37 PM, Wes McKinney <we...@gmail.com> wrote:
>>>>
>>>> hi Julien,
>>>>
>>>> I'm very sorry about the inconvenience with this and the delay in
>>>> getting it sorted out. I will triage this evening by disabling the
>>>> Parquet tests in Arrow until we get the current problems under
>>>> control. When we re-enable the Parquet tests in Travis CI I agree we
>>>> should pin the version SHA.
>>>>
>>>> - Wes
>>>>
>>>>> On Tue, Sep 6, 2016 at 5:30 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>>>> The Arrow cpp travis-ci build is broken right now because it depends on
>>>>> parquet-cpp which has changed in an incompatible way. [1] [2] (or so it
>>>>> looks to me)
>>>>> Since parquet-cpp is not released yet it is totally fine to make
>>>>> incompatible API changes.
>>>>> However, we may want to pin the Arrow to Parquet dependency (on a git
>>>> sha?)
>>>>> to prevent cross project changes from breaking the master build.
>>>>> Since I'm not one of the core cpp dev on those projects I mainly want to
>>>>> start that conversation rather than prescribe a solution. Feel free to
>>>> take
>>>>> this as a straw man and suggest something else.
>>>>>
>>>>> [1] https://travis-ci.org/apache/arrow/jobs/156080555
>>>>> [2]
>>>>> https://github.com/apache/arrow/blob/2d8ec789365f3c0f82b1f22d76160d
>>>> 5af150dd31/ci/travis_before_script_cpp.sh
>>>>>
>>>>> --
>>>>> Julien
>>>
>>>
>>> --
>>> Julien

Re: Arrow-Parquet integration location (Was: Arrow cpp travis-ci build broken)

Posted by Wes McKinney <we...@gmail.com>.

hi Julien,

It makes sense to move the Parquet support for Arrow into Parquet
itself and invert the dependency. I had thought that the coupling to
Arrow C++'s IO subsystem might be tighter, but the connection between
memory allocators and file abstractions is fairly simple:

https://github.com/apache/arrow/blob/master/cpp/src/arrow/parquet/io.h

I'll open appropriate JIRAs and Uwe and I can coordinate on the refactoring.

The exposure of the Parquet functionality in Python should stay inside
Arrow for now, but mainly because it would make developing the Python
side of things much more difficult if we split things up right now.

- Wes

On Tue, Sep 6, 2016 at 8:27 PM, Brian Bowman <Br...@sas.com> wrote:
> Forgive me if interposing my first post for the Apache Arrow project on this thread is incorrect procedure.
>
> What Julien proposes with each storage layer producing Arrow Record Batches is exactly how I envision it working and would certainly make Arrow integration with SAS much more palatable.  This is likely true for other storage layer providers as well.
>
> Brian Bowman (SAS)
>
>> On Sep 6, 2016, at 7:52 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>
>> Thanks Wes,
>> No worries, I know you are on top of those things.
>> On a side note, I was wondering if the arrow-parquet integration should be
>> in Parquet instead.
>> Parquet would depend on Arrow and not the other way around.
>> Arrow provides the API and each storage layer (Parquet, Kudu, Cassandra,
>> ...) provides a way to produce Arrow Record Batches.
>> thoughts?
>>
>>> On Tue, Sep 6, 2016 at 3:37 PM, Wes McKinney <we...@gmail.com> wrote:
>>>
>>> hi Julien,
>>>
>>> I'm very sorry about the inconvenience with this and the delay in
>>> getting it sorted out. I will triage this evening by disabling the
>>> Parquet tests in Arrow until we get the current problems under
>>> control. When we re-enable the Parquet tests in Travis CI I agree we
>>> should pin the version SHA.
>>>
>>> - Wes
>>>
>>>> On Tue, Sep 6, 2016 at 5:30 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>>> The Arrow cpp travis-ci build is broken right now because it depends on
>>>> parquet-cpp which has changed in an incompatible way. [1] [2] (or so it
>>>> looks to me)
>>>> Since parquet-cpp is not released yet it is totally fine to make
>>>> incompatible API changes.
>>>> However, we may want to pin the Arrow to Parquet dependency (on a git
>>> sha?)
>>>> to prevent cross project changes from breaking the master build.
>>>> Since I'm not one of the core cpp dev on those projects I mainly want to
>>>> start that conversation rather than prescribe a solution. Feel free to
>>> take
>>>> this as a straw man and suggest something else.
>>>>
>>>> [1] https://travis-ci.org/apache/arrow/jobs/156080555
>>>> [2]
>>>> https://github.com/apache/arrow/blob/2d8ec789365f3c0f82b1f22d76160d
>>> 5af150dd31/ci/travis_before_script_cpp.sh
>>>>
>>>>
>>>> --
>>>> Julien
>>
>>
>>
>> --
>> Julien

Re: Arrow-Parquet integration location (Was: Arrow cpp travis-ci build broken)

Posted by Wes McKinney <we...@gmail.com>.

hi Julien,

It makes sense to move the Parquet support for Arrow into Parquet
itself and invert the dependency. I had thought that the coupling to
Arrow C++'s IO subsystem might be tighter, but the connection between
memory allocators and file abstractions is fairly simple:

https://github.com/apache/arrow/blob/master/cpp/src/arrow/parquet/io.h

I'll open appropriate JIRAs and Uwe and I can coordinate on the refactoring.

The exposure of the Parquet functionality in Python should stay inside
Arrow for now, but mainly because it would make developing the Python
side of things much more difficult if we split things up right now.

- Wes

On Tue, Sep 6, 2016 at 8:27 PM, Brian Bowman <Br...@sas.com> wrote:
> Forgive me if interposing my first post for the Apache Arrow project on this thread is incorrect procedure.
>
> What Julien proposes with each storage layer producing Arrow Record Batches is exactly how I envision it working and would certainly make Arrow integration with SAS much more palatable.  This is likely true for other storage layer providers as well.
>
> Brian Bowman (SAS)
>
>> On Sep 6, 2016, at 7:52 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>
>> Thanks Wes,
>> No worries, I know you are on top of those things.
>> On a side note, I was wondering if the arrow-parquet integration should be
>> in Parquet instead.
>> Parquet would depend on Arrow and not the other way around.
>> Arrow provides the API and each storage layer (Parquet, Kudu, Cassandra,
>> ...) provides a way to produce Arrow Record Batches.
>> thoughts?
>>
>>> On Tue, Sep 6, 2016 at 3:37 PM, Wes McKinney <we...@gmail.com> wrote:
>>>
>>> hi Julien,
>>>
>>> I'm very sorry about the inconvenience with this and the delay in
>>> getting it sorted out. I will triage this evening by disabling the
>>> Parquet tests in Arrow until we get the current problems under
>>> control. When we re-enable the Parquet tests in Travis CI I agree we
>>> should pin the version SHA.
>>>
>>> - Wes
>>>
>>>> On Tue, Sep 6, 2016 at 5:30 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>>> The Arrow cpp travis-ci build is broken right now because it depends on
>>>> parquet-cpp which has changed in an incompatible way. [1] [2] (or so it
>>>> looks to me)
>>>> Since parquet-cpp is not released yet it is totally fine to make
>>>> incompatible API changes.
>>>> However, we may want to pin the Arrow to Parquet dependency (on a git
>>> sha?)
>>>> to prevent cross project changes from breaking the master build.
>>>> Since I'm not one of the core cpp dev on those projects I mainly want to
>>>> start that conversation rather than prescribe a solution. Feel free to
>>> take
>>>> this as a straw man and suggest something else.
>>>>
>>>> [1] https://travis-ci.org/apache/arrow/jobs/156080555
>>>> [2]
>>>> https://github.com/apache/arrow/blob/2d8ec789365f3c0f82b1f22d76160d
>>> 5af150dd31/ci/travis_before_script_cpp.sh
>>>>
>>>>
>>>> --
>>>> Julien
>>
>>
>>
>> --
>> Julien

Re: Arrow-Parquet integration location (Was: Arrow cpp travis-ci build broken)

Posted by Brian Bowman <Br...@sas.com>.

Forgive me if interposing my first post for the Apache Arrow project on this thread is incorrect procedure. 

What Julien proposes with each storage layer producing Arrow Record Batches is exactly how I envision it working and would certainly make Arrow integration with SAS much more palatable.  This is likely true for other storage layer providers as well. 

Brian Bowman (SAS)

> On Sep 6, 2016, at 7:52 PM, Julien Le Dem <ju...@dremio.com> wrote:
> 
> Thanks Wes,
> No worries, I know you are on top of those things.
> On a side note, I was wondering if the arrow-parquet integration should be
> in Parquet instead.
> Parquet would depend on Arrow and not the other way around.
> Arrow provides the API and each storage layer (Parquet, Kudu, Cassandra,
> ...) provides a way to produce Arrow Record Batches.
> thoughts?
> 
>> On Tue, Sep 6, 2016 at 3:37 PM, Wes McKinney <we...@gmail.com> wrote:
>> 
>> hi Julien,
>> 
>> I'm very sorry about the inconvenience with this and the delay in
>> getting it sorted out. I will triage this evening by disabling the
>> Parquet tests in Arrow until we get the current problems under
>> control. When we re-enable the Parquet tests in Travis CI I agree we
>> should pin the version SHA.
>> 
>> - Wes
>> 
>>> On Tue, Sep 6, 2016 at 5:30 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>> The Arrow cpp travis-ci build is broken right now because it depends on
>>> parquet-cpp which has changed in an incompatible way. [1] [2] (or so it
>>> looks to me)
>>> Since parquet-cpp is not released yet it is totally fine to make
>>> incompatible API changes.
>>> However, we may want to pin the Arrow to Parquet dependency (on a git
>> sha?)
>>> to prevent cross project changes from breaking the master build.
>>> Since I'm not one of the core cpp dev on those projects I mainly want to
>>> start that conversation rather than prescribe a solution. Feel free to
>> take
>>> this as a straw man and suggest something else.
>>> 
>>> [1] https://travis-ci.org/apache/arrow/jobs/156080555
>>> [2]
>>> https://github.com/apache/arrow/blob/2d8ec789365f3c0f82b1f22d76160d
>> 5af150dd31/ci/travis_before_script_cpp.sh
>>> 
>>> 
>>> --
>>> Julien
> 
> 
> 
> -- 
> Julien