You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Jacques Nadeau <ja...@apache.org> on 2016/07/11 05:00:50 UTC

Arrow Board Report: Request for Feedback

Hello All,

Can I get some feedback and additional details to add to the Arrow Board
Report. See my draft below:

thanks,
Jacques


## Description:

Arrow is a columnar in-memory analytics layer designed to accelerate big
data.
It houses a set of canonical in-memory representations of flat and
hierarchical data along with multiple language-bindings for structure
manipulation. It also provides IPC and common algorithm implementations.

## Issues:

- There are no issues requiring board attention at this time.

## Activity:
- Awareness continues to increase with the community having done
presentations
  at various meetups as well as the following conferences: Pydata Paris,
Hadoop
  Summit Ireland, Hadoop Summit San Jose and Berlin Buzzwords.
- The CPP and Python work has seen steady progress and development of a
  Parquet <> Arrow interchange layer is improving. A small number of people
are
  actively moving that forward.
- Java development has slowed some but appears to be picking up again.

## Health report:
- Discussion and code activity has been sporadic. We saw a good initial
flurry
  of activity but need to make the project more approachable for new users.
We
  can do this by doing the following:
- We need to get a first release done.
- We need to put together a quickstart and demo application for new users.

## PMC changes:

- Currently 17 PMC members.
- No new PMC members added in the last 3 months.
- Last PMC addition was Abdel Hakim Deneche on Tue Jan 20 2016

## Committer base changes:

- Currently 20 committers.
- No new committers added in the last 3 months
- Last committer addition was Ippokratis Pandis at Thu Feb 18 2016

## Releases:

- No releases yet.

## JIRA activity:

- 71 JIRA tickets created in the last 3 months
- 40 JIRA tickets closed/resolved in the last 3 months

Re: Arrow Board Report: Request for Feedback

Posted by Wes McKinney <we...@gmail.com>.

+1, thanks Jacques

On Mon, Jul 11, 2016 at 2:05 PM, Julian Hyde <jh...@apache.org> wrote:
> +1 looks good!
>
>> On Jul 11, 2016, at 1:35 PM, Jacques Nadeau <ja...@apache.org> wrote:
>>
>> Good point. I added an additional bullet below.
>>
>>
>> ## Description:
>>
>> Arrow is a columnar in-memory analytics layer designed to accelerate big
>> data.
>> It houses a set of canonical in-memory representations of flat and
>> hierarchical data along with multiple language-bindings for structure
>> manipulation. It also provides IPC and common algorithm implementations.
>>
>> ## Issues:
>>
>> - There are no issues requiring board attention at this time.
>>
>> ## Activity:
>> - Awareness continues to increase with the community having done
>> presentations
>>  at various meetups as well as the following conferences: Pydata Paris,
>> Hadoop
>>  Summit Ireland, Hadoop Summit San Jose and Berlin Buzzwords.
>> - The CPP work has made good progress.
>> - The cross-project work with Parquet has seen substantial work (both in
>> the
>>  Parquet project and the Arrow project). This should be a great first
>> example
>>  proof-of-concept integration showing the benefits of in-memory columnar
>> layer.
>> - There has been substantial progress on development of for the IPC /
>> memory sharing.
>> - Java development has slowed some but appears to be picking up again.
>> - A new independent project called Feather is using Arrow as a format for
>> writing
>>  to disk. This has also increased engagement with Arrow itself and we have
>> a number
>>  excited communities including R & Python (and the Julia community
>> experimenting).
>>
>> ## Health report:
>> - We've seen good discussion and development activity since the last
>> report.
>> - We need to get to a first release.
>>  - Prior to doing so, the community is working on rudimentary integration
>> tests between
>>    Java and C++ and more formal format specification.
>> - More work can be done to make the project approachable to newly
>> interested parties by
>>  creating additional documentation and quickstart. A sample application
>> will also help.
>>
>> ## PMC changes:
>>
>> - Currently 17 PMC members.
>> - No new PMC members added in the last 3 months.
>> - Last PMC addition was Abdel Hakim Deneche on Tue Jan 20 2016
>>
>> ## Committer base changes:
>>
>> - Currently 20 committers.
>> - No new committers added in the last 3 months
>> - Last committer addition was Ippokratis Pandis at Thu Feb 18 2016
>>
>> ## Releases:
>>
>> - No releases yet.
>>
>> ## JIRA activity:
>>
>> - 71 JIRA tickets created in the last 3 months
>> - 40 JIRA tickets closed/resolved in the last 3 months
>

Re: Arrow Board Report: Request for Feedback

Posted by Julian Hyde <jh...@apache.org>.

+1 looks good!

> On Jul 11, 2016, at 1:35 PM, Jacques Nadeau <ja...@apache.org> wrote:
> 
> Good point. I added an additional bullet below.
> 
> 
> ## Description:
> 
> Arrow is a columnar in-memory analytics layer designed to accelerate big
> data.
> It houses a set of canonical in-memory representations of flat and
> hierarchical data along with multiple language-bindings for structure
> manipulation. It also provides IPC and common algorithm implementations.
> 
> ## Issues:
> 
> - There are no issues requiring board attention at this time.
> 
> ## Activity:
> - Awareness continues to increase with the community having done
> presentations
>  at various meetups as well as the following conferences: Pydata Paris,
> Hadoop
>  Summit Ireland, Hadoop Summit San Jose and Berlin Buzzwords.
> - The CPP work has made good progress.
> - The cross-project work with Parquet has seen substantial work (both in
> the
>  Parquet project and the Arrow project). This should be a great first
> example
>  proof-of-concept integration showing the benefits of in-memory columnar
> layer.
> - There has been substantial progress on development of for the IPC /
> memory sharing.
> - Java development has slowed some but appears to be picking up again.
> - A new independent project called Feather is using Arrow as a format for
> writing
>  to disk. This has also increased engagement with Arrow itself and we have
> a number
>  excited communities including R & Python (and the Julia community
> experimenting).
> 
> ## Health report:
> - We've seen good discussion and development activity since the last
> report.
> - We need to get to a first release.
>  - Prior to doing so, the community is working on rudimentary integration
> tests between
>    Java and C++ and more formal format specification.
> - More work can be done to make the project approachable to newly
> interested parties by
>  creating additional documentation and quickstart. A sample application
> will also help.
> 
> ## PMC changes:
> 
> - Currently 17 PMC members.
> - No new PMC members added in the last 3 months.
> - Last PMC addition was Abdel Hakim Deneche on Tue Jan 20 2016
> 
> ## Committer base changes:
> 
> - Currently 20 committers.
> - No new committers added in the last 3 months
> - Last committer addition was Ippokratis Pandis at Thu Feb 18 2016
> 
> ## Releases:
> 
> - No releases yet.
> 
> ## JIRA activity:
> 
> - 71 JIRA tickets created in the last 3 months
> - 40 JIRA tickets closed/resolved in the last 3 months

Re: Arrow Board Report: Request for Feedback

Posted by Jacques Nadeau <ja...@apache.org>.

Good point. I added an additional bullet below.


## Description:

Arrow is a columnar in-memory analytics layer designed to accelerate big
data.
It houses a set of canonical in-memory representations of flat and
hierarchical data along with multiple language-bindings for structure
manipulation. It also provides IPC and common algorithm implementations.

## Issues:

- There are no issues requiring board attention at this time.

## Activity:
- Awareness continues to increase with the community having done
presentations
  at various meetups as well as the following conferences: Pydata Paris,
Hadoop
  Summit Ireland, Hadoop Summit San Jose and Berlin Buzzwords.
- The CPP work has made good progress.
- The cross-project work with Parquet has seen substantial work (both in
the
  Parquet project and the Arrow project). This should be a great first
example
  proof-of-concept integration showing the benefits of in-memory columnar
layer.
- There has been substantial progress on development of for the IPC /
memory sharing.
- Java development has slowed some but appears to be picking up again.
- A new independent project called Feather is using Arrow as a format for
writing
  to disk. This has also increased engagement with Arrow itself and we have
a number
  excited communities including R & Python (and the Julia community
experimenting).

## Health report:
- We've seen good discussion and development activity since the last
report.
- We need to get to a first release.
  - Prior to doing so, the community is working on rudimentary integration
tests between
    Java and C++ and more formal format specification.
- More work can be done to make the project approachable to newly
interested parties by
  creating additional documentation and quickstart. A sample application
will also help.

## PMC changes:

- Currently 17 PMC members.
- No new PMC members added in the last 3 months.
- Last PMC addition was Abdel Hakim Deneche on Tue Jan 20 2016

## Committer base changes:

- Currently 20 committers.
- No new committers added in the last 3 months
- Last committer addition was Ippokratis Pandis at Thu Feb 18 2016

## Releases:

- No releases yet.

## JIRA activity:

- 71 JIRA tickets created in the last 3 months
- 40 JIRA tickets closed/resolved in the last 3 months

Re: Arrow Board Report: Request for Feedback

Posted by Uwe Korn <uw...@xhochy.com>.

Hello Jacques,

we could also add the Feather format [1] from Wes and Hadley Wickham to 
the list. It uses the Arrow spec to provide a common columnar 
interchange format for Python and R DataFrames. With speed and 
interoperability it has the same core values as Arrow itself. With it we 
have already got a userbase with a native (non-JVM ;)) implementation of 
Arrow. Through its buzz it also spurred some interest in Arrow in the 
Julia community (though I'm not clear how far they actually got with 
their prototype implementation).

Uwe

[1] https://github.com/wesm/feather

On 11.07.16 07:27, Jacques Nadeau wrote:
> An update incorporating your feedback:
>
> ## Description:
>
> Arrow is a columnar in-memory analytics layer designed to accelerate big
> data.
> It houses a set of canonical in-memory representations of flat and
> hierarchical data along with multiple language-bindings for structure
> manipulation. It also provides IPC and common algorithm implementations.
>
> ## Issues:
>
> - There are no issues requiring board attention at this time.
>
> ## Activity:
> - Awareness continues to increase with the community having done
> presentations
>    at various meetups as well as the following conferences: Pydata Paris,
> Hadoop
>    Summit Ireland, Hadoop Summit San Jose and Berlin Buzzwords.
> - The CPP work has made good progress.
> - The cross-project work with Parquet has seen substantial work (both in
> the
>    Parquet project and the Arrow project). This should be a great first
> example
>    proof-of-concept integration showing the benefits of an in-memory
> columnar layer.
> - There has been substantial progress on development of for the IPC /
> memory sharing.
> - Java development has slowed some but appears to be picking up again.
>
> ## Health report:
> - We've seen good discussion and development activity since the last
> report.
> - We need to get to a first release.
>    - Prior to doing so, the community is working on rudimentary integration
> tests between
>      Java and C++ and more formal format specification.
> - More work can be done to make the project approachable to newly
> interested parties by
>    creating additional documentation and quickstart. A sample application
> will also help.
>
> ## PMC changes:
>
> - Currently 17 PMC members.
> - No new PMC members added in the last 3 months.
> - Last PMC addition was Abdel Hakim Deneche on Tue Jan 20 2016
>
> ## Committer base changes:
>
> - Currently 20 committers.
> - No new committers added in the last 3 months
> - Last committer addition was Ippokratis Pandis at Thu Feb 18 2016
>
> ## Releases:
>
> - No releases yet.
>
> ## JIRA activity:
>
> - 71 JIRA tickets created in the last 3 months
> - 40 JIRA tickets closed/resolved in the last 3 months
>
> On Sun, Jul 10, 2016 at 10:07 PM, Wes McKinney <we...@gmail.com> wrote:
>
>> hi Jacques,
>>
>> I would mention there's been a significant synergy between C++ efforts
>> in Apache Parquet to build a bridge between the projects (which will
>> be a nice proof-of-concept of the benefits of the common in-memory
>> columnar layer).  Uwe Korn has been really active here with 33 commits
>> to Parquet and 21 to Arrow.
>>
>> Micah Kornfield has been contributing significant to the reification
>> process of the specs and IPC / memory sharing procedure. We should
>> prioritize assembling a more fully formed first-cut metadata spec and
>> getting rudimentary integration tests working between the Java and C++
>> implementations before we make a release.
>>
>> - Wes
>>
>>
>> On Sun, Jul 10, 2016 at 10:00 PM, Jacques Nadeau <ja...@apache.org>
>> wrote:
>>> Hello All,
>>>
>>> Can I get some feedback and additional details to add to the Arrow Board
>>> Report. See my draft below:
>>>
>>> thanks,
>>> Jacques
>>>
>>>
>>> ## Description:
>>>
>>> Arrow is a columnar in-memory analytics layer designed to accelerate big
>>> data.
>>> It houses a set of canonical in-memory representations of flat and
>>> hierarchical data along with multiple language-bindings for structure
>>> manipulation. It also provides IPC and common algorithm implementations.
>>>
>>> ## Issues:
>>>
>>> - There are no issues requiring board attention at this time.
>>>
>>> ## Activity:
>>> - Awareness continues to increase with the community having done
>>> presentations
>>>    at various meetups as well as the following conferences: Pydata Paris,
>>> Hadoop
>>>    Summit Ireland, Hadoop Summit San Jose and Berlin Buzzwords.
>>> - The CPP and Python work has seen steady progress and development of a
>>>    Parquet <> Arrow interchange layer is improving. A small number of
>> people
>>> are
>>>    actively moving that forward.
>>> - Java development has slowed some but appears to be picking up again.
>>>
>>> ## Health report:
>>> - Discussion and code activity has been sporadic. We saw a good initial
>>> flurry
>>>    of activity but need to make the project more approachable for new
>> users.
>>> We
>>>    can do this by doing the following:
>>> - We need to get a first release done.
>>> - We need to put together a quickstart and demo application for new
>> users.
>>> ## PMC changes:
>>>
>>> - Currently 17 PMC members.
>>> - No new PMC members added in the last 3 months.
>>> - Last PMC addition was Abdel Hakim Deneche on Tue Jan 20 2016
>>>
>>> ## Committer base changes:
>>>
>>> - Currently 20 committers.
>>> - No new committers added in the last 3 months
>>> - Last committer addition was Ippokratis Pandis at Thu Feb 18 2016
>>>
>>> ## Releases:
>>>
>>> - No releases yet.
>>>
>>> ## JIRA activity:
>>>
>>> - 71 JIRA tickets created in the last 3 months
>>> - 40 JIRA tickets closed/resolved in the last 3 months

Re: Arrow Board Report: Request for Feedback

Posted by Jacques Nadeau <ja...@apache.org>.

An update incorporating your feedback:

## Description:

Arrow is a columnar in-memory analytics layer designed to accelerate big
data.
It houses a set of canonical in-memory representations of flat and
hierarchical data along with multiple language-bindings for structure
manipulation. It also provides IPC and common algorithm implementations.

## Issues:

- There are no issues requiring board attention at this time.

## Activity:
- Awareness continues to increase with the community having done
presentations
  at various meetups as well as the following conferences: Pydata Paris,
Hadoop
  Summit Ireland, Hadoop Summit San Jose and Berlin Buzzwords.
- The CPP work has made good progress.
- The cross-project work with Parquet has seen substantial work (both in
the
  Parquet project and the Arrow project). This should be a great first
example
  proof-of-concept integration showing the benefits of an in-memory
columnar layer.
- There has been substantial progress on development of for the IPC /
memory sharing.
- Java development has slowed some but appears to be picking up again.

## Health report:
- We've seen good discussion and development activity since the last
report.
- We need to get to a first release.
  - Prior to doing so, the community is working on rudimentary integration
tests between
    Java and C++ and more formal format specification.
- More work can be done to make the project approachable to newly
interested parties by
  creating additional documentation and quickstart. A sample application
will also help.

## PMC changes:

- Currently 17 PMC members.
- No new PMC members added in the last 3 months.
- Last PMC addition was Abdel Hakim Deneche on Tue Jan 20 2016

## Committer base changes:

- Currently 20 committers.
- No new committers added in the last 3 months
- Last committer addition was Ippokratis Pandis at Thu Feb 18 2016

## Releases:

- No releases yet.

## JIRA activity:

- 71 JIRA tickets created in the last 3 months
- 40 JIRA tickets closed/resolved in the last 3 months

On Sun, Jul 10, 2016 at 10:07 PM, Wes McKinney <we...@gmail.com> wrote:

> hi Jacques,
>
> I would mention there's been a significant synergy between C++ efforts
> in Apache Parquet to build a bridge between the projects (which will
> be a nice proof-of-concept of the benefits of the common in-memory
> columnar layer).  Uwe Korn has been really active here with 33 commits
> to Parquet and 21 to Arrow.
>
> Micah Kornfield has been contributing significant to the reification
> process of the specs and IPC / memory sharing procedure. We should
> prioritize assembling a more fully formed first-cut metadata spec and
> getting rudimentary integration tests working between the Java and C++
> implementations before we make a release.
>
> - Wes
>
>
> On Sun, Jul 10, 2016 at 10:00 PM, Jacques Nadeau <ja...@apache.org>
> wrote:
> > Hello All,
> >
> > Can I get some feedback and additional details to add to the Arrow Board
> > Report. See my draft below:
> >
> > thanks,
> > Jacques
> >
> >
> > ## Description:
> >
> > Arrow is a columnar in-memory analytics layer designed to accelerate big
> > data.
> > It houses a set of canonical in-memory representations of flat and
> > hierarchical data along with multiple language-bindings for structure
> > manipulation. It also provides IPC and common algorithm implementations.
> >
> > ## Issues:
> >
> > - There are no issues requiring board attention at this time.
> >
> > ## Activity:
> > - Awareness continues to increase with the community having done
> > presentations
> >   at various meetups as well as the following conferences: Pydata Paris,
> > Hadoop
> >   Summit Ireland, Hadoop Summit San Jose and Berlin Buzzwords.
> > - The CPP and Python work has seen steady progress and development of a
> >   Parquet <> Arrow interchange layer is improving. A small number of
> people
> > are
> >   actively moving that forward.
> > - Java development has slowed some but appears to be picking up again.
> >
> > ## Health report:
> > - Discussion and code activity has been sporadic. We saw a good initial
> > flurry
> >   of activity but need to make the project more approachable for new
> users.
> > We
> >   can do this by doing the following:
> > - We need to get a first release done.
> > - We need to put together a quickstart and demo application for new
> users.
> >
> > ## PMC changes:
> >
> > - Currently 17 PMC members.
> > - No new PMC members added in the last 3 months.
> > - Last PMC addition was Abdel Hakim Deneche on Tue Jan 20 2016
> >
> > ## Committer base changes:
> >
> > - Currently 20 committers.
> > - No new committers added in the last 3 months
> > - Last committer addition was Ippokratis Pandis at Thu Feb 18 2016
> >
> > ## Releases:
> >
> > - No releases yet.
> >
> > ## JIRA activity:
> >
> > - 71 JIRA tickets created in the last 3 months
> > - 40 JIRA tickets closed/resolved in the last 3 months
>

Re: Arrow Board Report: Request for Feedback

Posted by Wes McKinney <we...@gmail.com>.

hi Jacques,

I would mention there's been a significant synergy between C++ efforts
in Apache Parquet to build a bridge between the projects (which will
be a nice proof-of-concept of the benefits of the common in-memory
columnar layer).  Uwe Korn has been really active here with 33 commits
to Parquet and 21 to Arrow.

Micah Kornfield has been contributing significant to the reification
process of the specs and IPC / memory sharing procedure. We should
prioritize assembling a more fully formed first-cut metadata spec and
getting rudimentary integration tests working between the Java and C++
implementations before we make a release.

- Wes


On Sun, Jul 10, 2016 at 10:00 PM, Jacques Nadeau <ja...@apache.org> wrote:
> Hello All,
>
> Can I get some feedback and additional details to add to the Arrow Board
> Report. See my draft below:
>
> thanks,
> Jacques
>
>
> ## Description:
>
> Arrow is a columnar in-memory analytics layer designed to accelerate big
> data.
> It houses a set of canonical in-memory representations of flat and
> hierarchical data along with multiple language-bindings for structure
> manipulation. It also provides IPC and common algorithm implementations.
>
> ## Issues:
>
> - There are no issues requiring board attention at this time.
>
> ## Activity:
> - Awareness continues to increase with the community having done
> presentations
>   at various meetups as well as the following conferences: Pydata Paris,
> Hadoop
>   Summit Ireland, Hadoop Summit San Jose and Berlin Buzzwords.
> - The CPP and Python work has seen steady progress and development of a
>   Parquet <> Arrow interchange layer is improving. A small number of people
> are
>   actively moving that forward.
> - Java development has slowed some but appears to be picking up again.
>
> ## Health report:
> - Discussion and code activity has been sporadic. We saw a good initial
> flurry
>   of activity but need to make the project more approachable for new users.
> We
>   can do this by doing the following:
> - We need to get a first release done.
> - We need to put together a quickstart and demo application for new users.
>
> ## PMC changes:
>
> - Currently 17 PMC members.
> - No new PMC members added in the last 3 months.
> - Last PMC addition was Abdel Hakim Deneche on Tue Jan 20 2016
>
> ## Committer base changes:
>
> - Currently 20 committers.
> - No new committers added in the last 3 months
> - Last committer addition was Ippokratis Pandis at Thu Feb 18 2016
>
> ## Releases:
>
> - No releases yet.
>
> ## JIRA activity:
>
> - 71 JIRA tickets created in the last 3 months
> - 40 JIRA tickets closed/resolved in the last 3 months