You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2017/07/14 14:31:11 UTC

[DRAFT] Apache Arrow board report

If anyone has any comments on this quarter's board report or anything
to add, please let us know.

Thanks!
Wes

## Description:

Arrow is a columnar in-memory analytics layer designed to accelerate big data.
It houses a set of canonical in-memory representations of flat and hierarchical
data along with multiple language-bindings for structure manipulation. It also
provides IPC and common algorithm implementations.

## Issues:
- There are no issues requiring board attention at this time.

## Activity:

- Heavy development activity and growing community since the last board
  report. We have made 3 releases, with the next release 0.5.0 coming soon.

- The Ray project for machine learning from the UC Berkeley RISELab contributed
  a large software component, a shared memory object store ("Plasma"), to the
  Apache Arrow project.

- The Arrow 0.3.0 release on May 2 included C and Ruby bindings for the Arrow
  C++ libraries. We have also seen a native JavaScript (TypeScript)
  implementation appear for use.

- We have made significant progress toward completing compatibility between the
  Java and C++ implementations of the Arrow memory format. As soon as we
  achieve reasonable completeness, we should consider leaping to Arrow 1.0.0 to
  communicate to the rest of the open source world that Arrow is no longer as
  much of a work-in-progress and ready for more widespread use.

- We have created the arrow-dist git repo to assist with cross-language and
  cross-platform packaging.

- Apache Spark has merged its first Arrow integration, SPARK-13534

- The external GPU Open Analytics Initiative is using Apache Arrow as its data
  interchange format

## Health report:

- Arrow is seeing an uptick in community interest and adoption. The increase in
  activity reflects the project's scope expanding (i.e. more programming
  languages) and increase in use in other projects. We expect this trend to
  continue as Arrow's perception changes to be deemed more production-ready and
  stable.

## PMC changes:

 - Currently 19 PMC members.
 - No new PMC members added in the last 3 months
 - Last PMC addition was Uwe Korn on Wed Apr 12 2017

## Committer base changes:

 - Currently 22 committers.
 - Kouhei Sutou was added as a committer on Wed May 10 2017

## Releases:

 - 0.3.0 was released on Thu May 04 2017
 - 0.4.0 was released on Mon May 22 2017
 - 0.4.1 was released on Thu Jun 08 2017

## Mailing list activity:

 - We changed our JIRA notification schema to send only issue *creation*
   e-mails to the primary mailing list, with further comments and edits going
   to issues@

 - dev@arrow.apache.org:
    - 547 subscribers (up 17 in the last 3 months):
    - 622 emails sent to list (1098 in previous quarter)

 - issues@arrow.apache.org:
    - 11 subscribers (up 0 in the last 3 months):
    - 1985 emails sent to list (1255 in previous quarter)

 - reviews@arrow.apache.org:
    - 9 subscribers (up 9 in the last 3 months)


## JIRA activity:

 - 395 JIRA tickets created in the last 3 months
 - 333 JIRA tickets closed/resolved in the last 3 months

Re: [DRAFT] Apache Arrow board report

Posted by Wes McKinney <we...@gmail.com>.

So sorry about that omission -- the turbodbc integration is huge
(getting cross-Python library C++ API working)! Here's the updated
report

## Description:

Arrow is a columnar in-memory analytics layer designed to accelerate big data.
It houses a set of canonical in-memory representations of flat and hierarchical
data along with multiple language-bindings for structure manipulation. It also
provides IPC and common algorithm implementations.

## Issues:
- There are no issues requiring board attention at this time.

## Activity:

- Heavy development activity and growing community since the last board
  report. We have made 3 releases, with the next release 0.5.0 coming soon.

- The Arrow 0.3.0 release on May 2 included C and Ruby bindings for the Arrow
  C++ libraries. We have also seen a native JavaScript (TypeScript)
  implementation appear for use.

- The TurbODBC C++ and Python project released version 2.0.0
  which included support for converting ODBC data to Apache
  Arrow. This was enabled by an internal C++ API to the Python
  Arrow bindings, and will help provide a blueprint for future
  thirdparty Python libraries that use Arrow.

- The Ray project for machine learning from the UC Berkeley RISELab contributed
  a large software component, a shared memory object store ("Plasma"), to the
  Apache Arrow project.

- We have made significant progress toward completing compatibility between the
  Java and C++ implementations of the Arrow memory format. As soon as we
  achieve reasonable completeness, we should consider leaping to Arrow 1.0.0 to
  communicate to the rest of the open source world that Arrow is no longer as
  much of a work-in-progress and ready for more widespread use.

- We have created the arrow-dist git repo to assist with cross-language and
  cross-platform packaging.

- Apache Spark has merged its first Arrow integration, SPARK-13534

- The external GPU Open Analytics Initiative is using Apache Arrow as its data
  interchange format

## Health report:

- Arrow is seeing an uptick in community interest and adoption. The increase in
  activity reflects the project's scope expanding (i.e. more programming
  languages) and increase in use in other projects. We expect this trend to
  continue as Arrow's perception changes to be deemed more production-ready and
  stable.

## PMC changes:

 - Currently 19 PMC members.
 - No new PMC members added in the last 3 months
 - Last PMC addition was Uwe Korn on Wed Apr 12 2017

## Committer base changes:

 - Currently 22 committers.
 - Kouhei Sutou was added as a committer on Wed May 10 2017

## Releases:

 - 0.3.0 was released on Thu May 04 2017
 - 0.4.0 was released on Mon May 22 2017
 - 0.4.1 was released on Thu Jun 08 2017

## Mailing list activity:

 - We changed our JIRA notification schema to send only issue *creation*
   e-mails to the primary mailing list, with further comments and edits going
   to issues@

 - dev@arrow.apache.org:
    - 547 subscribers (up 17 in the last 3 months):
    - 622 emails sent to list (1098 in previous quarter)

 - issues@arrow.apache.org:
    - 11 subscribers (up 0 in the last 3 months):
    - 1985 emails sent to list (1255 in previous quarter)

 - reviews@arrow.apache.org:
    - 9 subscribers (up 9 in the last 3 months)


## JIRA activity:

 - 395 JIRA tickets created in the last 3 months
 - 333 JIRA tickets closed/resolved in the last 3 months

On Fri, Jul 14, 2017 at 10:47 AM, Uwe L. Korn <uw...@xhochy.com> wrote:
> The report looks fine. Not sure if we already mentioned turbodbc in the
> last report. If not, we should include it.
>
> Uwe
>
> On Fri, Jul 14, 2017, at 04:37 PM, Jacques Nadeau wrote:
>> Wes, thanks for pulling this together! We've crazy busy getting reading
>> to
>> launch. I'll post this in a few hours after anybody provides any
>> suggested
>> additions/modifications.
>>
>> On Fri, Jul 14, 2017 at 7:31 AM, Wes McKinney <we...@gmail.com>
>> wrote:
>>
>> > If anyone has any comments on this quarter's board report or anything
>> > to add, please let us know.
>> >
>> > Thanks!
>> > Wes
>> >
>> > ## Description:
>> >
>> > Arrow is a columnar in-memory analytics layer designed to accelerate big
>> > data.
>> > It houses a set of canonical in-memory representations of flat and
>> > hierarchical
>> > data along with multiple language-bindings for structure manipulation. It
>> > also
>> > provides IPC and common algorithm implementations.
>> >
>> > ## Issues:
>> > - There are no issues requiring board attention at this time.
>> >
>> > ## Activity:
>> >
>> > - Heavy development activity and growing community since the last board
>> >   report. We have made 3 releases, with the next release 0.5.0 coming soon.
>> >
>> > - The Ray project for machine learning from the UC Berkeley RISELab
>> > contributed
>> >   a large software component, a shared memory object store ("Plasma"), to
>> > the
>> >   Apache Arrow project.
>> >
>> > - The Arrow 0.3.0 release on May 2 included C and Ruby bindings for the
>> > Arrow
>> >   C++ libraries. We have also seen a native JavaScript (TypeScript)
>> >   implementation appear for use.
>> >
>> > - We have made significant progress toward completing compatibility
>> > between the
>> >   Java and C++ implementations of the Arrow memory format. As soon as we
>> >   achieve reasonable completeness, we should consider leaping to Arrow
>> > 1.0.0 to
>> >   communicate to the rest of the open source world that Arrow is no longer
>> > as
>> >   much of a work-in-progress and ready for more widespread use.
>> >
>> > - We have created the arrow-dist git repo to assist with cross-language and
>> >   cross-platform packaging.
>> >
>> > - Apache Spark has merged its first Arrow integration, SPARK-13534
>> >
>> > - The external GPU Open Analytics Initiative is using Apache Arrow as its
>> > data
>> >   interchange format
>> >
>> > ## Health report:
>> >
>> > - Arrow is seeing an uptick in community interest and adoption. The
>> > increase in
>> >   activity reflects the project's scope expanding (i.e. more programming
>> >   languages) and increase in use in other projects. We expect this trend to
>> >   continue as Arrow's perception changes to be deemed more
>> > production-ready and
>> >   stable.
>> >
>> > ## PMC changes:
>> >
>> >  - Currently 19 PMC members.
>> >  - No new PMC members added in the last 3 months
>> >  - Last PMC addition was Uwe Korn on Wed Apr 12 2017
>> >
>> > ## Committer base changes:
>> >
>> >  - Currently 22 committers.
>> >  - Kouhei Sutou was added as a committer on Wed May 10 2017
>> >
>> > ## Releases:
>> >
>> >  - 0.3.0 was released on Thu May 04 2017
>> >  - 0.4.0 was released on Mon May 22 2017
>> >  - 0.4.1 was released on Thu Jun 08 2017
>> >
>> > ## Mailing list activity:
>> >
>> >  - We changed our JIRA notification schema to send only issue *creation*
>> >    e-mails to the primary mailing list, with further comments and edits
>> > going
>> >    to issues@
>> >
>> >  - dev@arrow.apache.org:
>> >     - 547 subscribers (up 17 in the last 3 months):
>> >     - 622 emails sent to list (1098 in previous quarter)
>> >
>> >  - issues@arrow.apache.org:
>> >     - 11 subscribers (up 0 in the last 3 months):
>> >     - 1985 emails sent to list (1255 in previous quarter)
>> >
>> >  - reviews@arrow.apache.org:
>> >     - 9 subscribers (up 9 in the last 3 months)
>> >
>> >
>> > ## JIRA activity:
>> >
>> >  - 395 JIRA tickets created in the last 3 months
>> >  - 333 JIRA tickets closed/resolved in the last 3 months
>> >

Re: [DRAFT] Apache Arrow board report

Posted by "Uwe L. Korn" <uw...@xhochy.com>.

The report looks fine. Not sure if we already mentioned turbodbc in the
last report. If not, we should include it.

Uwe

On Fri, Jul 14, 2017, at 04:37 PM, Jacques Nadeau wrote:
> Wes, thanks for pulling this together! We've crazy busy getting reading
> to
> launch. I'll post this in a few hours after anybody provides any
> suggested
> additions/modifications.
> 
> On Fri, Jul 14, 2017 at 7:31 AM, Wes McKinney <we...@gmail.com>
> wrote:
> 
> > If anyone has any comments on this quarter's board report or anything
> > to add, please let us know.
> >
> > Thanks!
> > Wes
> >
> > ## Description:
> >
> > Arrow is a columnar in-memory analytics layer designed to accelerate big
> > data.
> > It houses a set of canonical in-memory representations of flat and
> > hierarchical
> > data along with multiple language-bindings for structure manipulation. It
> > also
> > provides IPC and common algorithm implementations.
> >
> > ## Issues:
> > - There are no issues requiring board attention at this time.
> >
> > ## Activity:
> >
> > - Heavy development activity and growing community since the last board
> >   report. We have made 3 releases, with the next release 0.5.0 coming soon.
> >
> > - The Ray project for machine learning from the UC Berkeley RISELab
> > contributed
> >   a large software component, a shared memory object store ("Plasma"), to
> > the
> >   Apache Arrow project.
> >
> > - The Arrow 0.3.0 release on May 2 included C and Ruby bindings for the
> > Arrow
> >   C++ libraries. We have also seen a native JavaScript (TypeScript)
> >   implementation appear for use.
> >
> > - We have made significant progress toward completing compatibility
> > between the
> >   Java and C++ implementations of the Arrow memory format. As soon as we
> >   achieve reasonable completeness, we should consider leaping to Arrow
> > 1.0.0 to
> >   communicate to the rest of the open source world that Arrow is no longer
> > as
> >   much of a work-in-progress and ready for more widespread use.
> >
> > - We have created the arrow-dist git repo to assist with cross-language and
> >   cross-platform packaging.
> >
> > - Apache Spark has merged its first Arrow integration, SPARK-13534
> >
> > - The external GPU Open Analytics Initiative is using Apache Arrow as its
> > data
> >   interchange format
> >
> > ## Health report:
> >
> > - Arrow is seeing an uptick in community interest and adoption. The
> > increase in
> >   activity reflects the project's scope expanding (i.e. more programming
> >   languages) and increase in use in other projects. We expect this trend to
> >   continue as Arrow's perception changes to be deemed more
> > production-ready and
> >   stable.
> >
> > ## PMC changes:
> >
> >  - Currently 19 PMC members.
> >  - No new PMC members added in the last 3 months
> >  - Last PMC addition was Uwe Korn on Wed Apr 12 2017
> >
> > ## Committer base changes:
> >
> >  - Currently 22 committers.
> >  - Kouhei Sutou was added as a committer on Wed May 10 2017
> >
> > ## Releases:
> >
> >  - 0.3.0 was released on Thu May 04 2017
> >  - 0.4.0 was released on Mon May 22 2017
> >  - 0.4.1 was released on Thu Jun 08 2017
> >
> > ## Mailing list activity:
> >
> >  - We changed our JIRA notification schema to send only issue *creation*
> >    e-mails to the primary mailing list, with further comments and edits
> > going
> >    to issues@
> >
> >  - dev@arrow.apache.org:
> >     - 547 subscribers (up 17 in the last 3 months):
> >     - 622 emails sent to list (1098 in previous quarter)
> >
> >  - issues@arrow.apache.org:
> >     - 11 subscribers (up 0 in the last 3 months):
> >     - 1985 emails sent to list (1255 in previous quarter)
> >
> >  - reviews@arrow.apache.org:
> >     - 9 subscribers (up 9 in the last 3 months)
> >
> >
> > ## JIRA activity:
> >
> >  - 395 JIRA tickets created in the last 3 months
> >  - 333 JIRA tickets closed/resolved in the last 3 months
> >

Re: [DRAFT] Apache Arrow board report

Posted by Jacques Nadeau <ja...@apache.org>.

Wes, thanks for pulling this together! We've crazy busy getting reading to
launch. I'll post this in a few hours after anybody provides any suggested
additions/modifications.

On Fri, Jul 14, 2017 at 7:31 AM, Wes McKinney <we...@gmail.com> wrote:

> If anyone has any comments on this quarter's board report or anything
> to add, please let us know.
>
> Thanks!
> Wes
>
> ## Description:
>
> Arrow is a columnar in-memory analytics layer designed to accelerate big
> data.
> It houses a set of canonical in-memory representations of flat and
> hierarchical
> data along with multiple language-bindings for structure manipulation. It
> also
> provides IPC and common algorithm implementations.
>
> ## Issues:
> - There are no issues requiring board attention at this time.
>
> ## Activity:
>
> - Heavy development activity and growing community since the last board
>   report. We have made 3 releases, with the next release 0.5.0 coming soon.
>
> - The Ray project for machine learning from the UC Berkeley RISELab
> contributed
>   a large software component, a shared memory object store ("Plasma"), to
> the
>   Apache Arrow project.
>
> - The Arrow 0.3.0 release on May 2 included C and Ruby bindings for the
> Arrow
>   C++ libraries. We have also seen a native JavaScript (TypeScript)
>   implementation appear for use.
>
> - We have made significant progress toward completing compatibility
> between the
>   Java and C++ implementations of the Arrow memory format. As soon as we
>   achieve reasonable completeness, we should consider leaping to Arrow
> 1.0.0 to
>   communicate to the rest of the open source world that Arrow is no longer
> as
>   much of a work-in-progress and ready for more widespread use.
>
> - We have created the arrow-dist git repo to assist with cross-language and
>   cross-platform packaging.
>
> - Apache Spark has merged its first Arrow integration, SPARK-13534
>
> - The external GPU Open Analytics Initiative is using Apache Arrow as its
> data
>   interchange format
>
> ## Health report:
>
> - Arrow is seeing an uptick in community interest and adoption. The
> increase in
>   activity reflects the project's scope expanding (i.e. more programming
>   languages) and increase in use in other projects. We expect this trend to
>   continue as Arrow's perception changes to be deemed more
> production-ready and
>   stable.
>
> ## PMC changes:
>
>  - Currently 19 PMC members.
>  - No new PMC members added in the last 3 months
>  - Last PMC addition was Uwe Korn on Wed Apr 12 2017
>
> ## Committer base changes:
>
>  - Currently 22 committers.
>  - Kouhei Sutou was added as a committer on Wed May 10 2017
>
> ## Releases:
>
>  - 0.3.0 was released on Thu May 04 2017
>  - 0.4.0 was released on Mon May 22 2017
>  - 0.4.1 was released on Thu Jun 08 2017
>
> ## Mailing list activity:
>
>  - We changed our JIRA notification schema to send only issue *creation*
>    e-mails to the primary mailing list, with further comments and edits
> going
>    to issues@
>
>  - dev@arrow.apache.org:
>     - 547 subscribers (up 17 in the last 3 months):
>     - 622 emails sent to list (1098 in previous quarter)
>
>  - issues@arrow.apache.org:
>     - 11 subscribers (up 0 in the last 3 months):
>     - 1985 emails sent to list (1255 in previous quarter)
>
>  - reviews@arrow.apache.org:
>     - 9 subscribers (up 9 in the last 3 months)
>
>
> ## JIRA activity:
>
>  - 395 JIRA tickets created in the last 3 months
>  - 333 JIRA tickets closed/resolved in the last 3 months
>