You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Ian Cook <ia...@ursacomputing.com> on 2022/01/05 13:47:30 UTC

Arrow sync call January 5 at 12:00 US/Eastern, 17:00 UTC

Hi all,

Our biweekly sync call is today at 12:00 noon Eastern time.

The Zoom meeting URL for this and other biweekly Arrow sync calls is:
https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09

Alternatively, enter this information into the Zoom website or app to
join the call:
Meeting ID: 876 4903 3008
Passcode: 958092

Thanks,
Ian

Re: Arrow sync call January 5 at 12:00 US/Eastern, 17:00 UTC

Posted by Eduardo Ponce <ed...@gmail.com>.
Hi all,

With respect to what examples/information may be relevant to add/improve in
documentation, I find that browsing GitHub issues [1] is a good place to
identify some cases on how users are using Arrow. Moreover, many of the GH
issues related to code examples, contain snippets of code in the responses
demonstrating a possible approach which can be considered for documentation
examples and/or Arrow Cookbook [2].

[1] https://github.com/apache/arrow/issues?q=is%3Aissue
[2] https://arrow.apache.org/cookbook

On Wed, Jan 5, 2022 at 1:21 PM Rok Mihevc <ro...@gmail.com> wrote:

> Attendees
>
> Nic Crane, Micah Kornfeld, Eduardo Ponce, Will Jones, Rok Mihevc,
> David Li, Niranda Perera, Benson Muite
>
>
> Agenda
>
> - Discussion about the new columnar memory layout
> - Preparing for 7.0.0 release - 2nd or 3rd week of January
> - Documentation improvement
> - Support for table like structures (Apache Iceberg, Delta Lake)
>
>
> Minutes
>
> - Not enough stakeholders on the call to discuss the new layout
> proposal [1]. Micah might chime in on ML.
>
> - 7.0.0 release is scheduled for the 2nd or 3rd week of January.
> Please plan to complete PRs for 7.0.0 in time or bump Fix Version from
> 7.0.0 to 8.0.0 in your Jira issues not expected to be resolved in time
> [2].  See [3] to track the progress of the release.
>
> - Eduardo proposed discussion of documentation improvement. Main
> pinpoint being sparse documentation of C++ compute kernels available
> to users: Cookbook is not very extensive yet, few public examples of
> usage. Just browsing public API shows many undocumented
> functionalities. Functions are documented in code with docstrings, but
> these are not used for documentation (?). There is a table of kernels
> [4] but it could be more verbose. Could we use docstrings?
> Jon says R wrapper can pull C++ docstrings for it’s documentation but
> the mapping of functionality is not always 1-on-1.
> Eduardo: Another pain point is internal abstractions are not well
> documented which stalls new committers. Eduardo will open a PR for
> this. There are already two PRs in review to improve kernel docs: [5],
> [6].
>
> - Support for table like structures discussion - Micah is interested
> if there is any progress in this area. Will looked into this and
> opened two open Jiras for Delta Lake [7] and Iceberg [8]. Technically
> there are no issues implementing readers for either option, but there
> are some worries about governance/maintenance/licensing. We don’t have
> a reader for Avro hence Wil first looked into Delta Lake via the Rust
> reader.
>
>
> [1] https://lists.apache.org/thread/49qzofswg1r5z7zh39pjvd1m2ggz2kdq
> [2] https://lists.apache.org/thread/ng11x17yhvdfo8b3wgmd1qn40hy50g13
> [3] https://cwiki.apache.org/confluence/display/ARROW/Arrow+7.0.0+Release
> [4] https://arrow.apache.org/docs/cpp/compute.html
> [5] https://github.com/apache/arrow/pull/10296 - ARROW-12724: [C++]
> Add documentation for authoring compute kernels
> [6] https://github.com/apache/arrow/pull/12076 - ARROW-10317: [Python]
> Document compute function options
> [7] https://issues.apache.org/jira/browse/ARROW-14730
> [8] https://issues.apache.org/jira/browse/ARROW-15135
>

Re: Arrow sync call January 5 at 12:00 US/Eastern, 17:00 UTC

Posted by Rok Mihevc <ro...@gmail.com>.
Attendees

Nic Crane, Micah Kornfeld, Eduardo Ponce, Will Jones, Rok Mihevc,
David Li, Niranda Perera, Benson Muite


Agenda

- Discussion about the new columnar memory layout
- Preparing for 7.0.0 release - 2nd or 3rd week of January
- Documentation improvement
- Support for table like structures (Apache Iceberg, Delta Lake)


Minutes

- Not enough stakeholders on the call to discuss the new layout
proposal [1]. Micah might chime in on ML.

- 7.0.0 release is scheduled for the 2nd or 3rd week of January.
Please plan to complete PRs for 7.0.0 in time or bump Fix Version from
7.0.0 to 8.0.0 in your Jira issues not expected to be resolved in time
[2].  See [3] to track the progress of the release.

- Eduardo proposed discussion of documentation improvement. Main
pinpoint being sparse documentation of C++ compute kernels available
to users: Cookbook is not very extensive yet, few public examples of
usage. Just browsing public API shows many undocumented
functionalities. Functions are documented in code with docstrings, but
these are not used for documentation (?). There is a table of kernels
[4] but it could be more verbose. Could we use docstrings?
Jon says R wrapper can pull C++ docstrings for it’s documentation but
the mapping of functionality is not always 1-on-1.
Eduardo: Another pain point is internal abstractions are not well
documented which stalls new committers. Eduardo will open a PR for
this. There are already two PRs in review to improve kernel docs: [5],
[6].

- Support for table like structures discussion - Micah is interested
if there is any progress in this area. Will looked into this and
opened two open Jiras for Delta Lake [7] and Iceberg [8]. Technically
there are no issues implementing readers for either option, but there
are some worries about governance/maintenance/licensing. We don’t have
a reader for Avro hence Wil first looked into Delta Lake via the Rust
reader.


[1] https://lists.apache.org/thread/49qzofswg1r5z7zh39pjvd1m2ggz2kdq
[2] https://lists.apache.org/thread/ng11x17yhvdfo8b3wgmd1qn40hy50g13
[3] https://cwiki.apache.org/confluence/display/ARROW/Arrow+7.0.0+Release
[4] https://arrow.apache.org/docs/cpp/compute.html
[5] https://github.com/apache/arrow/pull/10296 - ARROW-12724: [C++]
Add documentation for authoring compute kernels
[6] https://github.com/apache/arrow/pull/12076 - ARROW-10317: [Python]
Document compute function options
[7] https://issues.apache.org/jira/browse/ARROW-14730
[8] https://issues.apache.org/jira/browse/ARROW-15135