You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Alessandro Molina <al...@ursacomputing.com> on 2021/07/07 15:52:05 UTC

Re: Apache Arrow Cookbook

We finally have a first preview of the cookbook available for R and Python,
for anyone interested the two versions are visible at
http://ursacomputing.com/arrow-cookbook/py/index.html and
http://ursacomputing.com/arrow-cookbook/r/index.html
A new version of the cookbook is automatically published on each new recipe.

After gathering feedback from interested parties and users, our plan for
the next step would be to open a PR against the arrow repository and
automate publishing the cookbook via github actions.

At the moment the recipes implemented are nearly half of those that were
identified in the dedicated Google Docs (
https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?ts=60c73189#heading=h.m7fas2talgy5
) so if you have recipes to suggest feel free to leave comments on that
document or suggest edits.


On Mon, Jun 21, 2021 at 10:34 AM Alessandro Molina <
alessandro@ursacomputing.com> wrote:

> Hi,
>
> I'd like to share with the ML an idea which me and Nic Crane have been
> experimenting with. It's still in the early stage, but we hope to turn it
> into a PR for Arrow documentation soon.
>
> The idea is to work on a Cookbook, a collection of ready made recipes, on
> how to use Arrow that both end users and developers of third party
> libraries can refer to when they need to look up "the arrow way" of doing
> something.
>
> While the arrow documentation reports all features and functions that are
> available in arrow, it's not always obvious how to best combine them for a
> new user. Sometimes the solution ends up being more complicated than
> necessary or performs badly due to not obvious side effects like unexpected
> memory copies etc.
>
> For this reason we thought about starting a documentation that users can
> refer to on how to combine arrow features to achieve the results they care
> about.
>
> We wrote a short document explaining the idea at
> https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?usp=sharing
>
> The core idea behind the cookbook is that all recipes should be testable,
> so it should be possible to add a CI phase for the cookbook that verifies
> that all the recipes still work with the current version of Arrow and lead
> to the expected results.
>
> At the moment we started it in a separate repository (
> https://github.com/ursacomputing/arrow-cookbook ), but we are yet unsure
> if it should live inside arrow/docs or its own directory (IE:
> arrow/cookbook) or its own repository. In the end it's fairly decoupled
> from the rest of Arrow and the documentation, which would have the benefit
> of allowing a dedicated release cycle every time new recipes are added (at
> least in the early phase).
>
> We are also looking for more ideas about recipes that would be good
> candidates for inclusion, so if any of you has thoughts about which recipes
> we should add please feel free to comment on the document or reply by mail
> suggesting more recipes.
>
> Any suggestion for improvements is appreciated! We hope to have something
> we can release with the next Arrow release.
>

Re: Apache Arrow Cookbook

Posted by Nic <th...@gmail.com>.
As of a few days ago, GitHub's "Issues" feature is now active on the Arrow
cookbook repo and is the way we hope to get feedback, discuss issues, and
requests for recipes etc.  There are already a few ongoing discussions
around implementation details.

https://github.com/apache/arrow-cookbook/issues

Re: Apache Arrow Cookbook

Posted by Nic <th...@gmail.com>.
As of a few days ago, GitHub's "Issues" feature is now active on the Arrow
cookbook repo and is the way we hope to get feedback, discuss issues, and
requests for recipes etc.  There are already a few ongoing discussions
around implementation details.

https://github.com/apache/arrow-cookbook/issues

Re: Apache Arrow Cookbook

Posted by Wes McKinney <we...@gmail.com>.
hi Alessandro — I just merged the PR, thank you! I would still like us
to move to use ipython_directive in the authoring of Python examples
so that authors do not have to copy-paste console output into the
recipes, but that doesn't have to be addressed right now.

Thanks,
Wes

On Wed, Jul 28, 2021 at 8:28 AM Alessandro Molina
<al...@ursacomputing.com> wrote:
>
> Hi everybody,
>
> The Cookbook PR has been open for more than a week at this point and we
> have received tons of great feedback and suggestions, many of which we
> incorporated already.
> For the benefit of being able to verify the publishing workflow and the CI
> I'd love to ask if there is anyone who could merge the PR (unless there are
> major blockers) as it's an apache repository and thus requires explicit
> permissions.
> So we can start verifying that the build process we put in place leads to
> the expected results and maybe add a link to the Cookbook from the Arrow
> Documentation before the new documentation gets deployed for 5.0.0
>
> On Tue, Jul 20, 2021 at 12:24 PM Alessandro Molina <
> alessandro@ursacomputing.com> wrote:
>
> > The Pull Request for the Cookbook has been created (
> > https://github.com/apache/arrow-cookbook/pull/1 )
> > I left as comments in the PR the steps that need to be done to enable
> > compilation of the cookbook once the PR is merged (enabling actions, gh
> > pages etc...) anyone willing to merge it should probably also take care of
> > those few steps so that we can make sure that all pieces are in place.
> > Thanks!
> >
> > On Wed, Jul 14, 2021 at 11:43 PM Wes McKinney <we...@gmail.com> wrote:
> >
> >> I just initialized
> >>
> >> https://github.com/apache/arrow-cookbook
> >>
> >> On Wed, Jul 14, 2021 at 1:33 PM Wes McKinney <we...@gmail.com> wrote:
> >> >
> >> > On Wed, Jul 14, 2021 at 8:33 AM Alessandro Molina
> >> > <al...@ursacomputing.com> wrote:
> >> > >
> >> > > On Tue, Jul 13, 2021 at 2:40 PM Wes McKinney <we...@gmail.com>
> >> wrote:
> >> > >
> >> > > > I requested its creation here
> >> > > >
> >> > > > https://github.com/apache/arrow-cookbook
> >> > > >
> >> > > > If you can set up a PR into this repo (not sure if I need to push an
> >> > > > empty "initial commit" repo, but let me know),
> >> > >
> >> > >
> >> > > Seems your concern was correct, you can't open PRs against an empty
> >> > > repository.
> >> > > If you could make an initial commit that would be great.
> >> >
> >> > OK, will do.
> >> >
> >> > >
> >> > > > please make sure
> >> > > > everyone who has contributed has an ICLA on file with the ASF
> >> > > > secretary. I'm not sure that it's necessary for us to conduct an IP
> >> > > > clearance but others can comment if they disagree.
> >> > > >
> >> > >
> >> > > I guess that http://people.apache.org/phonebook.html only covers a
> >> list of
> >> > > committers and PMC members, not general contributors.
> >> > > Is there any way to check if any contributor has already signed an
> >> ICLA?
> >> > > Also, for my general understanding, should we ask to sign the ICLA
> >> before
> >> > > accepting/merging PRs or is it acceptable to merge PRs from occasional
> >> > > contributions even in absence of a signed ICLA?
> >> >
> >> > Since https://github.com/ursacomputing/arrow-cookbook is "outside the
> >> > Arrow community", having the contributors to this repository sign
> >> > ICLAs would be a good practice before moving the code to an Apache
> >> > repository. Since this codebase isn't very old and we probably won't
> >> > be making official ASF releases of this project, the formal IP
> >> > clearance process is likely not necessary.
> >> >
> >> > We don't need ICLAs from normal contributors into Apache repositories.
> >>
> >

Re: Apache Arrow Cookbook

Posted by Alessandro Molina <al...@ursacomputing.com>.
Hi everybody,

The Cookbook PR has been open for more than a week at this point and we
have received tons of great feedback and suggestions, many of which we
incorporated already.
For the benefit of being able to verify the publishing workflow and the CI
I'd love to ask if there is anyone who could merge the PR (unless there are
major blockers) as it's an apache repository and thus requires explicit
permissions.
So we can start verifying that the build process we put in place leads to
the expected results and maybe add a link to the Cookbook from the Arrow
Documentation before the new documentation gets deployed for 5.0.0

On Tue, Jul 20, 2021 at 12:24 PM Alessandro Molina <
alessandro@ursacomputing.com> wrote:

> The Pull Request for the Cookbook has been created (
> https://github.com/apache/arrow-cookbook/pull/1 )
> I left as comments in the PR the steps that need to be done to enable
> compilation of the cookbook once the PR is merged (enabling actions, gh
> pages etc...) anyone willing to merge it should probably also take care of
> those few steps so that we can make sure that all pieces are in place.
> Thanks!
>
> On Wed, Jul 14, 2021 at 11:43 PM Wes McKinney <we...@gmail.com> wrote:
>
>> I just initialized
>>
>> https://github.com/apache/arrow-cookbook
>>
>> On Wed, Jul 14, 2021 at 1:33 PM Wes McKinney <we...@gmail.com> wrote:
>> >
>> > On Wed, Jul 14, 2021 at 8:33 AM Alessandro Molina
>> > <al...@ursacomputing.com> wrote:
>> > >
>> > > On Tue, Jul 13, 2021 at 2:40 PM Wes McKinney <we...@gmail.com>
>> wrote:
>> > >
>> > > > I requested its creation here
>> > > >
>> > > > https://github.com/apache/arrow-cookbook
>> > > >
>> > > > If you can set up a PR into this repo (not sure if I need to push an
>> > > > empty "initial commit" repo, but let me know),
>> > >
>> > >
>> > > Seems your concern was correct, you can't open PRs against an empty
>> > > repository.
>> > > If you could make an initial commit that would be great.
>> >
>> > OK, will do.
>> >
>> > >
>> > > > please make sure
>> > > > everyone who has contributed has an ICLA on file with the ASF
>> > > > secretary. I'm not sure that it's necessary for us to conduct an IP
>> > > > clearance but others can comment if they disagree.
>> > > >
>> > >
>> > > I guess that http://people.apache.org/phonebook.html only covers a
>> list of
>> > > committers and PMC members, not general contributors.
>> > > Is there any way to check if any contributor has already signed an
>> ICLA?
>> > > Also, for my general understanding, should we ask to sign the ICLA
>> before
>> > > accepting/merging PRs or is it acceptable to merge PRs from occasional
>> > > contributions even in absence of a signed ICLA?
>> >
>> > Since https://github.com/ursacomputing/arrow-cookbook is "outside the
>> > Arrow community", having the contributors to this repository sign
>> > ICLAs would be a good practice before moving the code to an Apache
>> > repository. Since this codebase isn't very old and we probably won't
>> > be making official ASF releases of this project, the formal IP
>> > clearance process is likely not necessary.
>> >
>> > We don't need ICLAs from normal contributors into Apache repositories.
>>
>

Re: Apache Arrow Cookbook

Posted by Alessandro Molina <al...@ursacomputing.com>.
The Pull Request for the Cookbook has been created (
https://github.com/apache/arrow-cookbook/pull/1 )
I left as comments in the PR the steps that need to be done to enable
compilation of the cookbook once the PR is merged (enabling actions, gh
pages etc...) anyone willing to merge it should probably also take care of
those few steps so that we can make sure that all pieces are in place.
Thanks!

On Wed, Jul 14, 2021 at 11:43 PM Wes McKinney <we...@gmail.com> wrote:

> I just initialized
>
> https://github.com/apache/arrow-cookbook
>
> On Wed, Jul 14, 2021 at 1:33 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > On Wed, Jul 14, 2021 at 8:33 AM Alessandro Molina
> > <al...@ursacomputing.com> wrote:
> > >
> > > On Tue, Jul 13, 2021 at 2:40 PM Wes McKinney <we...@gmail.com>
> wrote:
> > >
> > > > I requested its creation here
> > > >
> > > > https://github.com/apache/arrow-cookbook
> > > >
> > > > If you can set up a PR into this repo (not sure if I need to push an
> > > > empty "initial commit" repo, but let me know),
> > >
> > >
> > > Seems your concern was correct, you can't open PRs against an empty
> > > repository.
> > > If you could make an initial commit that would be great.
> >
> > OK, will do.
> >
> > >
> > > > please make sure
> > > > everyone who has contributed has an ICLA on file with the ASF
> > > > secretary. I'm not sure that it's necessary for us to conduct an IP
> > > > clearance but others can comment if they disagree.
> > > >
> > >
> > > I guess that http://people.apache.org/phonebook.html only covers a
> list of
> > > committers and PMC members, not general contributors.
> > > Is there any way to check if any contributor has already signed an
> ICLA?
> > > Also, for my general understanding, should we ask to sign the ICLA
> before
> > > accepting/merging PRs or is it acceptable to merge PRs from occasional
> > > contributions even in absence of a signed ICLA?
> >
> > Since https://github.com/ursacomputing/arrow-cookbook is "outside the
> > Arrow community", having the contributors to this repository sign
> > ICLAs would be a good practice before moving the code to an Apache
> > repository. Since this codebase isn't very old and we probably won't
> > be making official ASF releases of this project, the formal IP
> > clearance process is likely not necessary.
> >
> > We don't need ICLAs from normal contributors into Apache repositories.
>

Re: Apache Arrow Cookbook

Posted by Wes McKinney <we...@gmail.com>.
I just initialized

https://github.com/apache/arrow-cookbook

On Wed, Jul 14, 2021 at 1:33 PM Wes McKinney <we...@gmail.com> wrote:
>
> On Wed, Jul 14, 2021 at 8:33 AM Alessandro Molina
> <al...@ursacomputing.com> wrote:
> >
> > On Tue, Jul 13, 2021 at 2:40 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > > I requested its creation here
> > >
> > > https://github.com/apache/arrow-cookbook
> > >
> > > If you can set up a PR into this repo (not sure if I need to push an
> > > empty "initial commit" repo, but let me know),
> >
> >
> > Seems your concern was correct, you can't open PRs against an empty
> > repository.
> > If you could make an initial commit that would be great.
>
> OK, will do.
>
> >
> > > please make sure
> > > everyone who has contributed has an ICLA on file with the ASF
> > > secretary. I'm not sure that it's necessary for us to conduct an IP
> > > clearance but others can comment if they disagree.
> > >
> >
> > I guess that http://people.apache.org/phonebook.html only covers a list of
> > committers and PMC members, not general contributors.
> > Is there any way to check if any contributor has already signed an ICLA?
> > Also, for my general understanding, should we ask to sign the ICLA before
> > accepting/merging PRs or is it acceptable to merge PRs from occasional
> > contributions even in absence of a signed ICLA?
>
> Since https://github.com/ursacomputing/arrow-cookbook is "outside the
> Arrow community", having the contributors to this repository sign
> ICLAs would be a good practice before moving the code to an Apache
> repository. Since this codebase isn't very old and we probably won't
> be making official ASF releases of this project, the formal IP
> clearance process is likely not necessary.
>
> We don't need ICLAs from normal contributors into Apache repositories.

Re: Apache Arrow Cookbook

Posted by Wes McKinney <we...@gmail.com>.
On Wed, Jul 14, 2021 at 8:33 AM Alessandro Molina
<al...@ursacomputing.com> wrote:
>
> On Tue, Jul 13, 2021 at 2:40 PM Wes McKinney <we...@gmail.com> wrote:
>
> > I requested its creation here
> >
> > https://github.com/apache/arrow-cookbook
> >
> > If you can set up a PR into this repo (not sure if I need to push an
> > empty "initial commit" repo, but let me know),
>
>
> Seems your concern was correct, you can't open PRs against an empty
> repository.
> If you could make an initial commit that would be great.

OK, will do.

>
> > please make sure
> > everyone who has contributed has an ICLA on file with the ASF
> > secretary. I'm not sure that it's necessary for us to conduct an IP
> > clearance but others can comment if they disagree.
> >
>
> I guess that http://people.apache.org/phonebook.html only covers a list of
> committers and PMC members, not general contributors.
> Is there any way to check if any contributor has already signed an ICLA?
> Also, for my general understanding, should we ask to sign the ICLA before
> accepting/merging PRs or is it acceptable to merge PRs from occasional
> contributions even in absence of a signed ICLA?

Since https://github.com/ursacomputing/arrow-cookbook is "outside the
Arrow community", having the contributors to this repository sign
ICLAs would be a good practice before moving the code to an Apache
repository. Since this codebase isn't very old and we probably won't
be making official ASF releases of this project, the formal IP
clearance process is likely not necessary.

We don't need ICLAs from normal contributors into Apache repositories.

Re: Apache Arrow Cookbook

Posted by Alessandro Molina <al...@ursacomputing.com>.
On Tue, Jul 13, 2021 at 2:40 PM Wes McKinney <we...@gmail.com> wrote:

> I requested its creation here
>
> https://github.com/apache/arrow-cookbook
>
> If you can set up a PR into this repo (not sure if I need to push an
> empty "initial commit" repo, but let me know),


Seems your concern was correct, you can't open PRs against an empty
repository.
If you could make an initial commit that would be great.


> please make sure
> everyone who has contributed has an ICLA on file with the ASF
> secretary. I'm not sure that it's necessary for us to conduct an IP
> clearance but others can comment if they disagree.
>

I guess that http://people.apache.org/phonebook.html only covers a list of
committers and PMC members, not general contributors.
Is there any way to check if any contributor has already signed an ICLA?
Also, for my general understanding, should we ask to sign the ICLA before
accepting/merging PRs or is it acceptable to merge PRs from occasional
contributions even in absence of a signed ICLA?

Re: Apache Arrow Cookbook

Posted by Wes McKinney <we...@gmail.com>.
I requested its creation here

https://github.com/apache/arrow-cookbook

If you can set up a PR into this repo (not sure if I need to push an
empty "initial commit" repo, but let me know), please make sure
everyone who has contributed has an ICLA on file with the ASF
secretary. I'm not sure that it's necessary for us to conduct an IP
clearance but others can comment if they disagree.

We need to also ask ASF Infra to allow only squash commits from PRs. I
think it's OK if you want to use GitHub issues for the cookbook work.

Thanks

On Tue, Jul 13, 2021 at 3:25 AM Alessandro Molina
<al...@ursacomputing.com> wrote:
>
> How should we move forward to "request" an arrow-cookbook repository under
> the apache organization? Is there a form or request that has to be
> submitted?
> Another thing we were wondering, is that being able to deal with
> contributions using GitHub Issues would lower the barrier for users who
> find issues and want to report them, once the repository is moved under the
> Apache organization will we be able to keep using GitHub issues like the
> rust projects are doing or should we enforce usage of JIRA for reporting
> issues?
>
> On Fri, Jul 9, 2021 at 5:59 PM Wes McKinney <we...@gmail.com> wrote:
>
> > Some benefits of separating the cookbook from the documentation would
> > be to decouple its release / publication from Arrow releases, so you
> > can roll out new content to the published version as soon as it's
> > merged into the repository, where in the same fashion we might not
> > want to publish inter-release changes to the documentation. You could
> > also have a separate entry point to increase navigability (since the
> > documentation is intended to be more of a reference book).
> >
> > Given that the Rust projects have decoupled into multiple
> > repositories, a "cookbook" repository could also be a place to collect
> > recipes related to DataFusion.
> >
> > Either option is plenty reasonable, though, so feel free to choose
> > what makes the most sense to you.
> >
> > On Thu, Jul 8, 2021 at 12:09 PM Alessandro Molina
> > <al...@ursacomputing.com> wrote:
> > >
> > > Thinking about it, I think that having the cookbook into its own
> > repository
> > > (apache/arrow-cookbook) might lower the barrier for contributors. You
> > only
> > > need to clone the cookbook and running `make` does also take care of
> > > installing the required dependencies, so in theory you don't even need to
> > > care too much about setting up your environment. But we can surely
> > improve
> > > the README in the repo further to ease contributions.
> > >
> > > I think we can also preserve the benefit that Nic mentioned of making
> > sure
> > > that on each Arrow build the recipes are verified by triggering a build
> > of
> > > the cookbook repository on each new arrow master change. Worst case,
> > have a
> > > nightly build for the cookbook that clones that latest arrow master
> > branch.
> > >
> > > Having a cookbook for C++ is a very good idea, that might be the next
> > step
> > > once we finish the Python and R versions. If people want to contribute
> > > cookbook versions for more languages that would be greatly appreciated
> > too.
> > >
> > > On the other hand, while we want to keep the cookbooks in the same
> > > repository and sharing the same infrastructure to keep a low entry
> > barrier
> > > (make py/r/X will just compile the cookbook for the language you
> > picked), I
> > > feel that keeping the cookbook separated per language is a good idea.
> > While
> > > it's cool to be able to compare the solution between languages, in
> > general
> > > developers look for the solution in their target language and might
> > > perceive as noise the other implementations.
> > > For example, we received similar feedback for the Arrow documentation
> > too,
> > > that as a Python developer it's hard to find what you are looking for
> > > because it's mixed with the "format" and "C++" documentation and there
> > are
> > > a few links back and forth between them.
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Jul 8, 2021 at 11:39 AM Nic <th...@gmail.com> wrote:
> > >
> > > > One of the possible aims for the cookbook is having interlinked
> > > > documentation between function docs and the cookbook, and both the R
> > and
> > > > Python docs include tests that all of the outputs are expected.
> > Including
> > > > these tests means that we can immediately see if any code changes
> > render
> > > > any recipes incorrect.  Therefore the decoupling between cookbook
> > updates
> > > > and docs updates may not be necessary.
> > > >
> > > > That said, there has been mention of having versions of the cookbook
> > tied
> > > > to released versions of Arrow, which sounds like a great idea.
> > > >
> > > > The repo also includes a Makefile which creates all the relevant
> > setup, so
> > > > hopefully that should simplify things for users.  The R cookbook uses
> > > > bookdown, which has a feature where a reader can click an 'edit'
> > button and
> > > > it automatically creates a fork where they can edit the cookbook and
> > submit
> > > > a PR directly from GitHub.
> > > >
> > > > It'd be great to see a lot of recipes in multiple languages, but in the
> > > > document of possible recipes circulated previously, we identified
> > slightly
> > > > different needs for recipes for R/Python, and this may be further
> > > > complicated by writing for slightly different audiences (from what I
> > > > understand, the pyarrow implementation may be more geared towards
> > people
> > > > building on top of the low-level bindings, whereas in R, we have both
> > that
> > > > audience as well as folks who just want to make their dplyr code run
> > faster
> > > > without needing to know that much about the details of Arrow).
> > > >
> > > > I wonder, though, if we could still achieve that by having an
> > additional
> > > > page that points to the recipes that *are* common between each
> > cookbook.
> > > >
> > > > On Thu, 8 Jul 2021 at 10:07, Antoine Pitrou <an...@python.org>
> > wrote:
> > > >
> > > > >
> > > > > Hi Rares,
> > > > >
> > > > > Documentation bugs and improvement requests are welcome, feel free to
> > > > > file them on the JIRA!
> > > > >
> > > > > Regards
> > > > >
> > > > > Antoine.
> > > > >
> > > > >
> > > > > Le 08/07/2021 à 01:45, Rares Vernica a écrit :
> > > > > > Awesome! We would find C++ versions of these recipes very useful.
> > From
> > > > > our
> > > > > > experience the C++ API is much much harder to deal with and error
> > prone
> > > > > > than the R/Python one.
> > > > > >
> > > > > > Cheers,
> > > > > > Rares
> > > > > >
> > > > > > On Wed, Jul 7, 2021 at 9:07 AM Alessandro Molina <
> > > > > > alessandro@ursacomputing.com> wrote:
> > > > > >
> > > > > >> Yes, that was mostly what I meant when I wrote that the next step
> > is
> > > > > >> opening a PR against the apache/arrow repository itself :D
> > > > > >> We moved forward in a separate repository initially to be able to
> > > > cycle
> > > > > >> more quickly, but we reached a point where we think we can start
> > > > > >> integrating the cookbook with the Arrow documentation itself.
> > > > > >>
> > > > > >> If instead it's preferred to move forward the effort into its own
> > > > > separated
> > > > > >> repository (apache/arrow-cookbook) that's an option too, we are
> > open
> > > > to
> > > > > >> suggestions from the community.
> > > > > >>
> > > > > >> On Wed, Jul 7, 2021 at 5:57 PM Wes McKinney <we...@gmail.com>
> > > > > wrote:
> > > > > >>
> > > > > >>> What do you think about developing this cookbook in an Apache
> > Arrow
> > > > > >>> repository (it could be something like apache/arrow-cookbook, if
> > not
> > > > > >>> part of the main development repo)? Creating expanded
> > documentation
> > > > > >>> resources for learning how to use Apache Arrow to solve problems
> > > > seems
> > > > > >>> certainly within the bounds of the community's objectives.
> > > > > >>>
> > > > > >>> On Wed, Jul 7, 2021 at 5:52 PM Alessandro Molina
> > > > > >>> <al...@ursacomputing.com> wrote:
> > > > > >>>>
> > > > > >>>> We finally have a first preview of the cookbook available for R
> > and
> > > > > >>> Python,
> > > > > >>>> for anyone interested the two versions are visible at
> > > > > >>>> http://ursacomputing.com/arrow-cookbook/py/index.html and
> > > > > >>>> http://ursacomputing.com/arrow-cookbook/r/index.html
> > > > > >>>> A new version of the cookbook is automatically published on
> > each new
> > > > > >>> recipe.
> > > > > >>>>
> > > > > >>>> After gathering feedback from interested parties and users, our
> > plan
> > > > > >> for
> > > > > >>>> the next step would be to open a PR against the arrow
> > repository and
> > > > > >>>> automate publishing the cookbook via github actions.
> > > > > >>>>
> > > > > >>>> At the moment the recipes implemented are nearly half of those
> > that
> > > > > >> were
> > > > > >>>> identified in the dedicated Google Docs (
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?ts=60c73189#heading=h.m7fas2talgy5
> > > > > >>>> ) so if you have recipes to suggest feel free to leave comments
> > on
> > > > > that
> > > > > >>>> document or suggest edits.
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On Mon, Jun 21, 2021 at 10:34 AM Alessandro Molina <
> > > > > >>>> alessandro@ursacomputing.com> wrote:
> > > > > >>>>
> > > > > >>>>> Hi,
> > > > > >>>>>
> > > > > >>>>> I'd like to share with the ML an idea which me and Nic Crane
> > have
> > > > > >> been
> > > > > >>>>> experimenting with. It's still in the early stage, but we hope
> > to
> > > > > >> turn
> > > > > >>> it
> > > > > >>>>> into a PR for Arrow documentation soon.
> > > > > >>>>>
> > > > > >>>>> The idea is to work on a Cookbook, a collection of ready made
> > > > > >> recipes,
> > > > > >>> on
> > > > > >>>>> how to use Arrow that both end users and developers of third
> > party
> > > > > >>>>> libraries can refer to when they need to look up "the arrow
> > way" of
> > > > > >>> doing
> > > > > >>>>> something.
> > > > > >>>>>
> > > > > >>>>> While the arrow documentation reports all features and
> > functions
> > > > that
> > > > > >>> are
> > > > > >>>>> available in arrow, it's not always obvious how to best combine
> > > > them
> > > > > >>> for a
> > > > > >>>>> new user. Sometimes the solution ends up being more complicated
> > > > than
> > > > > >>>>> necessary or performs badly due to not obvious side effects
> > like
> > > > > >>> unexpected
> > > > > >>>>> memory copies etc.
> > > > > >>>>>
> > > > > >>>>> For this reason we thought about starting a documentation that
> > > > users
> > > > > >>> can
> > > > > >>>>> refer to on how to combine arrow features to achieve the
> > results
> > > > they
> > > > > >>> care
> > > > > >>>>> about.
> > > > > >>>>>
> > > > > >>>>> We wrote a short document explaining the idea at
> > > > > >>>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?usp=sharing
> > > > > >>>>>
> > > > > >>>>> The core idea behind the cookbook is that all recipes should be
> > > > > >>> testable,
> > > > > >>>>> so it should be possible to add a CI phase for the cookbook
> > that
> > > > > >>> verifies
> > > > > >>>>> that all the recipes still work with the current version of
> > Arrow
> > > > and
> > > > > >>> lead
> > > > > >>>>> to the expected results.
> > > > > >>>>>
> > > > > >>>>> At the moment we started it in a separate repository (
> > > > > >>>>> https://github.com/ursacomputing/arrow-cookbook ), but we are
> > yet
> > > > > >>> unsure
> > > > > >>>>> if it should live inside arrow/docs or its own directory (IE:
> > > > > >>>>> arrow/cookbook) or its own repository. In the end it's fairly
> > > > > >> decoupled
> > > > > >>>>> from the rest of Arrow and the documentation, which would have
> > the
> > > > > >>> benefit
> > > > > >>>>> of allowing a dedicated release cycle every time new recipes
> > are
> > > > > >> added
> > > > > >>> (at
> > > > > >>>>> least in the early phase).
> > > > > >>>>>
> > > > > >>>>> We are also looking for more ideas about recipes that would be
> > good
> > > > > >>>>> candidates for inclusion, so if any of you has thoughts about
> > which
> > > > > >>> recipes
> > > > > >>>>> we should add please feel free to comment on the document or
> > reply
> > > > by
> > > > > >>> mail
> > > > > >>>>> suggesting more recipes.
> > > > > >>>>>
> > > > > >>>>> Any suggestion for improvements is appreciated! We hope to have
> > > > > >>> something
> > > > > >>>>> we can release with the next Arrow release.
> > > > > >>>>>
> > > > > >>>
> > > > > >>
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Nic Crane
> > > > _______________________
> > > > @nic_crane <https://twitter.com/nic_crane>
> > > > https://thisisnic.github.io/
> > > >
> >

Re: Apache Arrow Cookbook

Posted by Alessandro Molina <al...@ursacomputing.com>.
How should we move forward to "request" an arrow-cookbook repository under
the apache organization? Is there a form or request that has to be
submitted?
Another thing we were wondering, is that being able to deal with
contributions using GitHub Issues would lower the barrier for users who
find issues and want to report them, once the repository is moved under the
Apache organization will we be able to keep using GitHub issues like the
rust projects are doing or should we enforce usage of JIRA for reporting
issues?

On Fri, Jul 9, 2021 at 5:59 PM Wes McKinney <we...@gmail.com> wrote:

> Some benefits of separating the cookbook from the documentation would
> be to decouple its release / publication from Arrow releases, so you
> can roll out new content to the published version as soon as it's
> merged into the repository, where in the same fashion we might not
> want to publish inter-release changes to the documentation. You could
> also have a separate entry point to increase navigability (since the
> documentation is intended to be more of a reference book).
>
> Given that the Rust projects have decoupled into multiple
> repositories, a "cookbook" repository could also be a place to collect
> recipes related to DataFusion.
>
> Either option is plenty reasonable, though, so feel free to choose
> what makes the most sense to you.
>
> On Thu, Jul 8, 2021 at 12:09 PM Alessandro Molina
> <al...@ursacomputing.com> wrote:
> >
> > Thinking about it, I think that having the cookbook into its own
> repository
> > (apache/arrow-cookbook) might lower the barrier for contributors. You
> only
> > need to clone the cookbook and running `make` does also take care of
> > installing the required dependencies, so in theory you don't even need to
> > care too much about setting up your environment. But we can surely
> improve
> > the README in the repo further to ease contributions.
> >
> > I think we can also preserve the benefit that Nic mentioned of making
> sure
> > that on each Arrow build the recipes are verified by triggering a build
> of
> > the cookbook repository on each new arrow master change. Worst case,
> have a
> > nightly build for the cookbook that clones that latest arrow master
> branch.
> >
> > Having a cookbook for C++ is a very good idea, that might be the next
> step
> > once we finish the Python and R versions. If people want to contribute
> > cookbook versions for more languages that would be greatly appreciated
> too.
> >
> > On the other hand, while we want to keep the cookbooks in the same
> > repository and sharing the same infrastructure to keep a low entry
> barrier
> > (make py/r/X will just compile the cookbook for the language you
> picked), I
> > feel that keeping the cookbook separated per language is a good idea.
> While
> > it's cool to be able to compare the solution between languages, in
> general
> > developers look for the solution in their target language and might
> > perceive as noise the other implementations.
> > For example, we received similar feedback for the Arrow documentation
> too,
> > that as a Python developer it's hard to find what you are looking for
> > because it's mixed with the "format" and "C++" documentation and there
> are
> > a few links back and forth between them.
> >
> >
> >
> >
> >
> > On Thu, Jul 8, 2021 at 11:39 AM Nic <th...@gmail.com> wrote:
> >
> > > One of the possible aims for the cookbook is having interlinked
> > > documentation between function docs and the cookbook, and both the R
> and
> > > Python docs include tests that all of the outputs are expected.
> Including
> > > these tests means that we can immediately see if any code changes
> render
> > > any recipes incorrect.  Therefore the decoupling between cookbook
> updates
> > > and docs updates may not be necessary.
> > >
> > > That said, there has been mention of having versions of the cookbook
> tied
> > > to released versions of Arrow, which sounds like a great idea.
> > >
> > > The repo also includes a Makefile which creates all the relevant
> setup, so
> > > hopefully that should simplify things for users.  The R cookbook uses
> > > bookdown, which has a feature where a reader can click an 'edit'
> button and
> > > it automatically creates a fork where they can edit the cookbook and
> submit
> > > a PR directly from GitHub.
> > >
> > > It'd be great to see a lot of recipes in multiple languages, but in the
> > > document of possible recipes circulated previously, we identified
> slightly
> > > different needs for recipes for R/Python, and this may be further
> > > complicated by writing for slightly different audiences (from what I
> > > understand, the pyarrow implementation may be more geared towards
> people
> > > building on top of the low-level bindings, whereas in R, we have both
> that
> > > audience as well as folks who just want to make their dplyr code run
> faster
> > > without needing to know that much about the details of Arrow).
> > >
> > > I wonder, though, if we could still achieve that by having an
> additional
> > > page that points to the recipes that *are* common between each
> cookbook.
> > >
> > > On Thu, 8 Jul 2021 at 10:07, Antoine Pitrou <an...@python.org>
> wrote:
> > >
> > > >
> > > > Hi Rares,
> > > >
> > > > Documentation bugs and improvement requests are welcome, feel free to
> > > > file them on the JIRA!
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > > >
> > > > Le 08/07/2021 à 01:45, Rares Vernica a écrit :
> > > > > Awesome! We would find C++ versions of these recipes very useful.
> From
> > > > our
> > > > > experience the C++ API is much much harder to deal with and error
> prone
> > > > > than the R/Python one.
> > > > >
> > > > > Cheers,
> > > > > Rares
> > > > >
> > > > > On Wed, Jul 7, 2021 at 9:07 AM Alessandro Molina <
> > > > > alessandro@ursacomputing.com> wrote:
> > > > >
> > > > >> Yes, that was mostly what I meant when I wrote that the next step
> is
> > > > >> opening a PR against the apache/arrow repository itself :D
> > > > >> We moved forward in a separate repository initially to be able to
> > > cycle
> > > > >> more quickly, but we reached a point where we think we can start
> > > > >> integrating the cookbook with the Arrow documentation itself.
> > > > >>
> > > > >> If instead it's preferred to move forward the effort into its own
> > > > separated
> > > > >> repository (apache/arrow-cookbook) that's an option too, we are
> open
> > > to
> > > > >> suggestions from the community.
> > > > >>
> > > > >> On Wed, Jul 7, 2021 at 5:57 PM Wes McKinney <we...@gmail.com>
> > > > wrote:
> > > > >>
> > > > >>> What do you think about developing this cookbook in an Apache
> Arrow
> > > > >>> repository (it could be something like apache/arrow-cookbook, if
> not
> > > > >>> part of the main development repo)? Creating expanded
> documentation
> > > > >>> resources for learning how to use Apache Arrow to solve problems
> > > seems
> > > > >>> certainly within the bounds of the community's objectives.
> > > > >>>
> > > > >>> On Wed, Jul 7, 2021 at 5:52 PM Alessandro Molina
> > > > >>> <al...@ursacomputing.com> wrote:
> > > > >>>>
> > > > >>>> We finally have a first preview of the cookbook available for R
> and
> > > > >>> Python,
> > > > >>>> for anyone interested the two versions are visible at
> > > > >>>> http://ursacomputing.com/arrow-cookbook/py/index.html and
> > > > >>>> http://ursacomputing.com/arrow-cookbook/r/index.html
> > > > >>>> A new version of the cookbook is automatically published on
> each new
> > > > >>> recipe.
> > > > >>>>
> > > > >>>> After gathering feedback from interested parties and users, our
> plan
> > > > >> for
> > > > >>>> the next step would be to open a PR against the arrow
> repository and
> > > > >>>> automate publishing the cookbook via github actions.
> > > > >>>>
> > > > >>>> At the moment the recipes implemented are nearly half of those
> that
> > > > >> were
> > > > >>>> identified in the dedicated Google Docs (
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?ts=60c73189#heading=h.m7fas2talgy5
> > > > >>>> ) so if you have recipes to suggest feel free to leave comments
> on
> > > > that
> > > > >>>> document or suggest edits.
> > > > >>>>
> > > > >>>>
> > > > >>>> On Mon, Jun 21, 2021 at 10:34 AM Alessandro Molina <
> > > > >>>> alessandro@ursacomputing.com> wrote:
> > > > >>>>
> > > > >>>>> Hi,
> > > > >>>>>
> > > > >>>>> I'd like to share with the ML an idea which me and Nic Crane
> have
> > > > >> been
> > > > >>>>> experimenting with. It's still in the early stage, but we hope
> to
> > > > >> turn
> > > > >>> it
> > > > >>>>> into a PR for Arrow documentation soon.
> > > > >>>>>
> > > > >>>>> The idea is to work on a Cookbook, a collection of ready made
> > > > >> recipes,
> > > > >>> on
> > > > >>>>> how to use Arrow that both end users and developers of third
> party
> > > > >>>>> libraries can refer to when they need to look up "the arrow
> way" of
> > > > >>> doing
> > > > >>>>> something.
> > > > >>>>>
> > > > >>>>> While the arrow documentation reports all features and
> functions
> > > that
> > > > >>> are
> > > > >>>>> available in arrow, it's not always obvious how to best combine
> > > them
> > > > >>> for a
> > > > >>>>> new user. Sometimes the solution ends up being more complicated
> > > than
> > > > >>>>> necessary or performs badly due to not obvious side effects
> like
> > > > >>> unexpected
> > > > >>>>> memory copies etc.
> > > > >>>>>
> > > > >>>>> For this reason we thought about starting a documentation that
> > > users
> > > > >>> can
> > > > >>>>> refer to on how to combine arrow features to achieve the
> results
> > > they
> > > > >>> care
> > > > >>>>> about.
> > > > >>>>>
> > > > >>>>> We wrote a short document explaining the idea at
> > > > >>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?usp=sharing
> > > > >>>>>
> > > > >>>>> The core idea behind the cookbook is that all recipes should be
> > > > >>> testable,
> > > > >>>>> so it should be possible to add a CI phase for the cookbook
> that
> > > > >>> verifies
> > > > >>>>> that all the recipes still work with the current version of
> Arrow
> > > and
> > > > >>> lead
> > > > >>>>> to the expected results.
> > > > >>>>>
> > > > >>>>> At the moment we started it in a separate repository (
> > > > >>>>> https://github.com/ursacomputing/arrow-cookbook ), but we are
> yet
> > > > >>> unsure
> > > > >>>>> if it should live inside arrow/docs or its own directory (IE:
> > > > >>>>> arrow/cookbook) or its own repository. In the end it's fairly
> > > > >> decoupled
> > > > >>>>> from the rest of Arrow and the documentation, which would have
> the
> > > > >>> benefit
> > > > >>>>> of allowing a dedicated release cycle every time new recipes
> are
> > > > >> added
> > > > >>> (at
> > > > >>>>> least in the early phase).
> > > > >>>>>
> > > > >>>>> We are also looking for more ideas about recipes that would be
> good
> > > > >>>>> candidates for inclusion, so if any of you has thoughts about
> which
> > > > >>> recipes
> > > > >>>>> we should add please feel free to comment on the document or
> reply
> > > by
> > > > >>> mail
> > > > >>>>> suggesting more recipes.
> > > > >>>>>
> > > > >>>>> Any suggestion for improvements is appreciated! We hope to have
> > > > >>> something
> > > > >>>>> we can release with the next Arrow release.
> > > > >>>>>
> > > > >>>
> > > > >>
> > > > >
> > > >
> > >
> > >
> > > --
> > > Nic Crane
> > > _______________________
> > > @nic_crane <https://twitter.com/nic_crane>
> > > https://thisisnic.github.io/
> > >
>

Re: Apache Arrow Cookbook

Posted by Wes McKinney <we...@gmail.com>.
Some benefits of separating the cookbook from the documentation would
be to decouple its release / publication from Arrow releases, so you
can roll out new content to the published version as soon as it's
merged into the repository, where in the same fashion we might not
want to publish inter-release changes to the documentation. You could
also have a separate entry point to increase navigability (since the
documentation is intended to be more of a reference book).

Given that the Rust projects have decoupled into multiple
repositories, a "cookbook" repository could also be a place to collect
recipes related to DataFusion.

Either option is plenty reasonable, though, so feel free to choose
what makes the most sense to you.

On Thu, Jul 8, 2021 at 12:09 PM Alessandro Molina
<al...@ursacomputing.com> wrote:
>
> Thinking about it, I think that having the cookbook into its own repository
> (apache/arrow-cookbook) might lower the barrier for contributors. You only
> need to clone the cookbook and running `make` does also take care of
> installing the required dependencies, so in theory you don't even need to
> care too much about setting up your environment. But we can surely improve
> the README in the repo further to ease contributions.
>
> I think we can also preserve the benefit that Nic mentioned of making sure
> that on each Arrow build the recipes are verified by triggering a build of
> the cookbook repository on each new arrow master change. Worst case, have a
> nightly build for the cookbook that clones that latest arrow master branch.
>
> Having a cookbook for C++ is a very good idea, that might be the next step
> once we finish the Python and R versions. If people want to contribute
> cookbook versions for more languages that would be greatly appreciated too.
>
> On the other hand, while we want to keep the cookbooks in the same
> repository and sharing the same infrastructure to keep a low entry barrier
> (make py/r/X will just compile the cookbook for the language you picked), I
> feel that keeping the cookbook separated per language is a good idea. While
> it's cool to be able to compare the solution between languages, in general
> developers look for the solution in their target language and might
> perceive as noise the other implementations.
> For example, we received similar feedback for the Arrow documentation too,
> that as a Python developer it's hard to find what you are looking for
> because it's mixed with the "format" and "C++" documentation and there are
> a few links back and forth between them.
>
>
>
>
>
> On Thu, Jul 8, 2021 at 11:39 AM Nic <th...@gmail.com> wrote:
>
> > One of the possible aims for the cookbook is having interlinked
> > documentation between function docs and the cookbook, and both the R and
> > Python docs include tests that all of the outputs are expected.  Including
> > these tests means that we can immediately see if any code changes render
> > any recipes incorrect.  Therefore the decoupling between cookbook updates
> > and docs updates may not be necessary.
> >
> > That said, there has been mention of having versions of the cookbook tied
> > to released versions of Arrow, which sounds like a great idea.
> >
> > The repo also includes a Makefile which creates all the relevant setup, so
> > hopefully that should simplify things for users.  The R cookbook uses
> > bookdown, which has a feature where a reader can click an 'edit' button and
> > it automatically creates a fork where they can edit the cookbook and submit
> > a PR directly from GitHub.
> >
> > It'd be great to see a lot of recipes in multiple languages, but in the
> > document of possible recipes circulated previously, we identified slightly
> > different needs for recipes for R/Python, and this may be further
> > complicated by writing for slightly different audiences (from what I
> > understand, the pyarrow implementation may be more geared towards people
> > building on top of the low-level bindings, whereas in R, we have both that
> > audience as well as folks who just want to make their dplyr code run faster
> > without needing to know that much about the details of Arrow).
> >
> > I wonder, though, if we could still achieve that by having an additional
> > page that points to the recipes that *are* common between each cookbook.
> >
> > On Thu, 8 Jul 2021 at 10:07, Antoine Pitrou <an...@python.org> wrote:
> >
> > >
> > > Hi Rares,
> > >
> > > Documentation bugs and improvement requests are welcome, feel free to
> > > file them on the JIRA!
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 08/07/2021 à 01:45, Rares Vernica a écrit :
> > > > Awesome! We would find C++ versions of these recipes very useful. From
> > > our
> > > > experience the C++ API is much much harder to deal with and error prone
> > > > than the R/Python one.
> > > >
> > > > Cheers,
> > > > Rares
> > > >
> > > > On Wed, Jul 7, 2021 at 9:07 AM Alessandro Molina <
> > > > alessandro@ursacomputing.com> wrote:
> > > >
> > > >> Yes, that was mostly what I meant when I wrote that the next step is
> > > >> opening a PR against the apache/arrow repository itself :D
> > > >> We moved forward in a separate repository initially to be able to
> > cycle
> > > >> more quickly, but we reached a point where we think we can start
> > > >> integrating the cookbook with the Arrow documentation itself.
> > > >>
> > > >> If instead it's preferred to move forward the effort into its own
> > > separated
> > > >> repository (apache/arrow-cookbook) that's an option too, we are open
> > to
> > > >> suggestions from the community.
> > > >>
> > > >> On Wed, Jul 7, 2021 at 5:57 PM Wes McKinney <we...@gmail.com>
> > > wrote:
> > > >>
> > > >>> What do you think about developing this cookbook in an Apache Arrow
> > > >>> repository (it could be something like apache/arrow-cookbook, if not
> > > >>> part of the main development repo)? Creating expanded documentation
> > > >>> resources for learning how to use Apache Arrow to solve problems
> > seems
> > > >>> certainly within the bounds of the community's objectives.
> > > >>>
> > > >>> On Wed, Jul 7, 2021 at 5:52 PM Alessandro Molina
> > > >>> <al...@ursacomputing.com> wrote:
> > > >>>>
> > > >>>> We finally have a first preview of the cookbook available for R and
> > > >>> Python,
> > > >>>> for anyone interested the two versions are visible at
> > > >>>> http://ursacomputing.com/arrow-cookbook/py/index.html and
> > > >>>> http://ursacomputing.com/arrow-cookbook/r/index.html
> > > >>>> A new version of the cookbook is automatically published on each new
> > > >>> recipe.
> > > >>>>
> > > >>>> After gathering feedback from interested parties and users, our plan
> > > >> for
> > > >>>> the next step would be to open a PR against the arrow repository and
> > > >>>> automate publishing the cookbook via github actions.
> > > >>>>
> > > >>>> At the moment the recipes implemented are nearly half of those that
> > > >> were
> > > >>>> identified in the dedicated Google Docs (
> > > >>>>
> > > >>>
> > > >>
> > >
> > https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?ts=60c73189#heading=h.m7fas2talgy5
> > > >>>> ) so if you have recipes to suggest feel free to leave comments on
> > > that
> > > >>>> document or suggest edits.
> > > >>>>
> > > >>>>
> > > >>>> On Mon, Jun 21, 2021 at 10:34 AM Alessandro Molina <
> > > >>>> alessandro@ursacomputing.com> wrote:
> > > >>>>
> > > >>>>> Hi,
> > > >>>>>
> > > >>>>> I'd like to share with the ML an idea which me and Nic Crane have
> > > >> been
> > > >>>>> experimenting with. It's still in the early stage, but we hope to
> > > >> turn
> > > >>> it
> > > >>>>> into a PR for Arrow documentation soon.
> > > >>>>>
> > > >>>>> The idea is to work on a Cookbook, a collection of ready made
> > > >> recipes,
> > > >>> on
> > > >>>>> how to use Arrow that both end users and developers of third party
> > > >>>>> libraries can refer to when they need to look up "the arrow way" of
> > > >>> doing
> > > >>>>> something.
> > > >>>>>
> > > >>>>> While the arrow documentation reports all features and functions
> > that
> > > >>> are
> > > >>>>> available in arrow, it's not always obvious how to best combine
> > them
> > > >>> for a
> > > >>>>> new user. Sometimes the solution ends up being more complicated
> > than
> > > >>>>> necessary or performs badly due to not obvious side effects like
> > > >>> unexpected
> > > >>>>> memory copies etc.
> > > >>>>>
> > > >>>>> For this reason we thought about starting a documentation that
> > users
> > > >>> can
> > > >>>>> refer to on how to combine arrow features to achieve the results
> > they
> > > >>> care
> > > >>>>> about.
> > > >>>>>
> > > >>>>> We wrote a short document explaining the idea at
> > > >>>>>
> > > >>>
> > > >>
> > >
> > https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?usp=sharing
> > > >>>>>
> > > >>>>> The core idea behind the cookbook is that all recipes should be
> > > >>> testable,
> > > >>>>> so it should be possible to add a CI phase for the cookbook that
> > > >>> verifies
> > > >>>>> that all the recipes still work with the current version of Arrow
> > and
> > > >>> lead
> > > >>>>> to the expected results.
> > > >>>>>
> > > >>>>> At the moment we started it in a separate repository (
> > > >>>>> https://github.com/ursacomputing/arrow-cookbook ), but we are yet
> > > >>> unsure
> > > >>>>> if it should live inside arrow/docs or its own directory (IE:
> > > >>>>> arrow/cookbook) or its own repository. In the end it's fairly
> > > >> decoupled
> > > >>>>> from the rest of Arrow and the documentation, which would have the
> > > >>> benefit
> > > >>>>> of allowing a dedicated release cycle every time new recipes are
> > > >> added
> > > >>> (at
> > > >>>>> least in the early phase).
> > > >>>>>
> > > >>>>> We are also looking for more ideas about recipes that would be good
> > > >>>>> candidates for inclusion, so if any of you has thoughts about which
> > > >>> recipes
> > > >>>>> we should add please feel free to comment on the document or reply
> > by
> > > >>> mail
> > > >>>>> suggesting more recipes.
> > > >>>>>
> > > >>>>> Any suggestion for improvements is appreciated! We hope to have
> > > >>> something
> > > >>>>> we can release with the next Arrow release.
> > > >>>>>
> > > >>>
> > > >>
> > > >
> > >
> >
> >
> > --
> > Nic Crane
> > _______________________
> > @nic_crane <https://twitter.com/nic_crane>
> > https://thisisnic.github.io/
> >

Re: Apache Arrow Cookbook

Posted by Alessandro Molina <al...@ursacomputing.com>.
Thinking about it, I think that having the cookbook into its own repository
(apache/arrow-cookbook) might lower the barrier for contributors. You only
need to clone the cookbook and running `make` does also take care of
installing the required dependencies, so in theory you don't even need to
care too much about setting up your environment. But we can surely improve
the README in the repo further to ease contributions.

I think we can also preserve the benefit that Nic mentioned of making sure
that on each Arrow build the recipes are verified by triggering a build of
the cookbook repository on each new arrow master change. Worst case, have a
nightly build for the cookbook that clones that latest arrow master branch.

Having a cookbook for C++ is a very good idea, that might be the next step
once we finish the Python and R versions. If people want to contribute
cookbook versions for more languages that would be greatly appreciated too.

On the other hand, while we want to keep the cookbooks in the same
repository and sharing the same infrastructure to keep a low entry barrier
(make py/r/X will just compile the cookbook for the language you picked), I
feel that keeping the cookbook separated per language is a good idea. While
it's cool to be able to compare the solution between languages, in general
developers look for the solution in their target language and might
perceive as noise the other implementations.
For example, we received similar feedback for the Arrow documentation too,
that as a Python developer it's hard to find what you are looking for
because it's mixed with the "format" and "C++" documentation and there are
a few links back and forth between them.





On Thu, Jul 8, 2021 at 11:39 AM Nic <th...@gmail.com> wrote:

> One of the possible aims for the cookbook is having interlinked
> documentation between function docs and the cookbook, and both the R and
> Python docs include tests that all of the outputs are expected.  Including
> these tests means that we can immediately see if any code changes render
> any recipes incorrect.  Therefore the decoupling between cookbook updates
> and docs updates may not be necessary.
>
> That said, there has been mention of having versions of the cookbook tied
> to released versions of Arrow, which sounds like a great idea.
>
> The repo also includes a Makefile which creates all the relevant setup, so
> hopefully that should simplify things for users.  The R cookbook uses
> bookdown, which has a feature where a reader can click an 'edit' button and
> it automatically creates a fork where they can edit the cookbook and submit
> a PR directly from GitHub.
>
> It'd be great to see a lot of recipes in multiple languages, but in the
> document of possible recipes circulated previously, we identified slightly
> different needs for recipes for R/Python, and this may be further
> complicated by writing for slightly different audiences (from what I
> understand, the pyarrow implementation may be more geared towards people
> building on top of the low-level bindings, whereas in R, we have both that
> audience as well as folks who just want to make their dplyr code run faster
> without needing to know that much about the details of Arrow).
>
> I wonder, though, if we could still achieve that by having an additional
> page that points to the recipes that *are* common between each cookbook.
>
> On Thu, 8 Jul 2021 at 10:07, Antoine Pitrou <an...@python.org> wrote:
>
> >
> > Hi Rares,
> >
> > Documentation bugs and improvement requests are welcome, feel free to
> > file them on the JIRA!
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 08/07/2021 à 01:45, Rares Vernica a écrit :
> > > Awesome! We would find C++ versions of these recipes very useful. From
> > our
> > > experience the C++ API is much much harder to deal with and error prone
> > > than the R/Python one.
> > >
> > > Cheers,
> > > Rares
> > >
> > > On Wed, Jul 7, 2021 at 9:07 AM Alessandro Molina <
> > > alessandro@ursacomputing.com> wrote:
> > >
> > >> Yes, that was mostly what I meant when I wrote that the next step is
> > >> opening a PR against the apache/arrow repository itself :D
> > >> We moved forward in a separate repository initially to be able to
> cycle
> > >> more quickly, but we reached a point where we think we can start
> > >> integrating the cookbook with the Arrow documentation itself.
> > >>
> > >> If instead it's preferred to move forward the effort into its own
> > separated
> > >> repository (apache/arrow-cookbook) that's an option too, we are open
> to
> > >> suggestions from the community.
> > >>
> > >> On Wed, Jul 7, 2021 at 5:57 PM Wes McKinney <we...@gmail.com>
> > wrote:
> > >>
> > >>> What do you think about developing this cookbook in an Apache Arrow
> > >>> repository (it could be something like apache/arrow-cookbook, if not
> > >>> part of the main development repo)? Creating expanded documentation
> > >>> resources for learning how to use Apache Arrow to solve problems
> seems
> > >>> certainly within the bounds of the community's objectives.
> > >>>
> > >>> On Wed, Jul 7, 2021 at 5:52 PM Alessandro Molina
> > >>> <al...@ursacomputing.com> wrote:
> > >>>>
> > >>>> We finally have a first preview of the cookbook available for R and
> > >>> Python,
> > >>>> for anyone interested the two versions are visible at
> > >>>> http://ursacomputing.com/arrow-cookbook/py/index.html and
> > >>>> http://ursacomputing.com/arrow-cookbook/r/index.html
> > >>>> A new version of the cookbook is automatically published on each new
> > >>> recipe.
> > >>>>
> > >>>> After gathering feedback from interested parties and users, our plan
> > >> for
> > >>>> the next step would be to open a PR against the arrow repository and
> > >>>> automate publishing the cookbook via github actions.
> > >>>>
> > >>>> At the moment the recipes implemented are nearly half of those that
> > >> were
> > >>>> identified in the dedicated Google Docs (
> > >>>>
> > >>>
> > >>
> >
> https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?ts=60c73189#heading=h.m7fas2talgy5
> > >>>> ) so if you have recipes to suggest feel free to leave comments on
> > that
> > >>>> document or suggest edits.
> > >>>>
> > >>>>
> > >>>> On Mon, Jun 21, 2021 at 10:34 AM Alessandro Molina <
> > >>>> alessandro@ursacomputing.com> wrote:
> > >>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>> I'd like to share with the ML an idea which me and Nic Crane have
> > >> been
> > >>>>> experimenting with. It's still in the early stage, but we hope to
> > >> turn
> > >>> it
> > >>>>> into a PR for Arrow documentation soon.
> > >>>>>
> > >>>>> The idea is to work on a Cookbook, a collection of ready made
> > >> recipes,
> > >>> on
> > >>>>> how to use Arrow that both end users and developers of third party
> > >>>>> libraries can refer to when they need to look up "the arrow way" of
> > >>> doing
> > >>>>> something.
> > >>>>>
> > >>>>> While the arrow documentation reports all features and functions
> that
> > >>> are
> > >>>>> available in arrow, it's not always obvious how to best combine
> them
> > >>> for a
> > >>>>> new user. Sometimes the solution ends up being more complicated
> than
> > >>>>> necessary or performs badly due to not obvious side effects like
> > >>> unexpected
> > >>>>> memory copies etc.
> > >>>>>
> > >>>>> For this reason we thought about starting a documentation that
> users
> > >>> can
> > >>>>> refer to on how to combine arrow features to achieve the results
> they
> > >>> care
> > >>>>> about.
> > >>>>>
> > >>>>> We wrote a short document explaining the idea at
> > >>>>>
> > >>>
> > >>
> >
> https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?usp=sharing
> > >>>>>
> > >>>>> The core idea behind the cookbook is that all recipes should be
> > >>> testable,
> > >>>>> so it should be possible to add a CI phase for the cookbook that
> > >>> verifies
> > >>>>> that all the recipes still work with the current version of Arrow
> and
> > >>> lead
> > >>>>> to the expected results.
> > >>>>>
> > >>>>> At the moment we started it in a separate repository (
> > >>>>> https://github.com/ursacomputing/arrow-cookbook ), but we are yet
> > >>> unsure
> > >>>>> if it should live inside arrow/docs or its own directory (IE:
> > >>>>> arrow/cookbook) or its own repository. In the end it's fairly
> > >> decoupled
> > >>>>> from the rest of Arrow and the documentation, which would have the
> > >>> benefit
> > >>>>> of allowing a dedicated release cycle every time new recipes are
> > >> added
> > >>> (at
> > >>>>> least in the early phase).
> > >>>>>
> > >>>>> We are also looking for more ideas about recipes that would be good
> > >>>>> candidates for inclusion, so if any of you has thoughts about which
> > >>> recipes
> > >>>>> we should add please feel free to comment on the document or reply
> by
> > >>> mail
> > >>>>> suggesting more recipes.
> > >>>>>
> > >>>>> Any suggestion for improvements is appreciated! We hope to have
> > >>> something
> > >>>>> we can release with the next Arrow release.
> > >>>>>
> > >>>
> > >>
> > >
> >
>
>
> --
> Nic Crane
> _______________________
> @nic_crane <https://twitter.com/nic_crane>
> https://thisisnic.github.io/
>

Re: Apache Arrow Cookbook

Posted by Nic <th...@gmail.com>.
One of the possible aims for the cookbook is having interlinked
documentation between function docs and the cookbook, and both the R and
Python docs include tests that all of the outputs are expected.  Including
these tests means that we can immediately see if any code changes render
any recipes incorrect.  Therefore the decoupling between cookbook updates
and docs updates may not be necessary.

That said, there has been mention of having versions of the cookbook tied
to released versions of Arrow, which sounds like a great idea.

The repo also includes a Makefile which creates all the relevant setup, so
hopefully that should simplify things for users.  The R cookbook uses
bookdown, which has a feature where a reader can click an 'edit' button and
it automatically creates a fork where they can edit the cookbook and submit
a PR directly from GitHub.

It'd be great to see a lot of recipes in multiple languages, but in the
document of possible recipes circulated previously, we identified slightly
different needs for recipes for R/Python, and this may be further
complicated by writing for slightly different audiences (from what I
understand, the pyarrow implementation may be more geared towards people
building on top of the low-level bindings, whereas in R, we have both that
audience as well as folks who just want to make their dplyr code run faster
without needing to know that much about the details of Arrow).

I wonder, though, if we could still achieve that by having an additional
page that points to the recipes that *are* common between each cookbook.

On Thu, 8 Jul 2021 at 10:07, Antoine Pitrou <an...@python.org> wrote:

>
> Hi Rares,
>
> Documentation bugs and improvement requests are welcome, feel free to
> file them on the JIRA!
>
> Regards
>
> Antoine.
>
>
> Le 08/07/2021 à 01:45, Rares Vernica a écrit :
> > Awesome! We would find C++ versions of these recipes very useful. From
> our
> > experience the C++ API is much much harder to deal with and error prone
> > than the R/Python one.
> >
> > Cheers,
> > Rares
> >
> > On Wed, Jul 7, 2021 at 9:07 AM Alessandro Molina <
> > alessandro@ursacomputing.com> wrote:
> >
> >> Yes, that was mostly what I meant when I wrote that the next step is
> >> opening a PR against the apache/arrow repository itself :D
> >> We moved forward in a separate repository initially to be able to cycle
> >> more quickly, but we reached a point where we think we can start
> >> integrating the cookbook with the Arrow documentation itself.
> >>
> >> If instead it's preferred to move forward the effort into its own
> separated
> >> repository (apache/arrow-cookbook) that's an option too, we are open to
> >> suggestions from the community.
> >>
> >> On Wed, Jul 7, 2021 at 5:57 PM Wes McKinney <we...@gmail.com>
> wrote:
> >>
> >>> What do you think about developing this cookbook in an Apache Arrow
> >>> repository (it could be something like apache/arrow-cookbook, if not
> >>> part of the main development repo)? Creating expanded documentation
> >>> resources for learning how to use Apache Arrow to solve problems seems
> >>> certainly within the bounds of the community's objectives.
> >>>
> >>> On Wed, Jul 7, 2021 at 5:52 PM Alessandro Molina
> >>> <al...@ursacomputing.com> wrote:
> >>>>
> >>>> We finally have a first preview of the cookbook available for R and
> >>> Python,
> >>>> for anyone interested the two versions are visible at
> >>>> http://ursacomputing.com/arrow-cookbook/py/index.html and
> >>>> http://ursacomputing.com/arrow-cookbook/r/index.html
> >>>> A new version of the cookbook is automatically published on each new
> >>> recipe.
> >>>>
> >>>> After gathering feedback from interested parties and users, our plan
> >> for
> >>>> the next step would be to open a PR against the arrow repository and
> >>>> automate publishing the cookbook via github actions.
> >>>>
> >>>> At the moment the recipes implemented are nearly half of those that
> >> were
> >>>> identified in the dedicated Google Docs (
> >>>>
> >>>
> >>
> https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?ts=60c73189#heading=h.m7fas2talgy5
> >>>> ) so if you have recipes to suggest feel free to leave comments on
> that
> >>>> document or suggest edits.
> >>>>
> >>>>
> >>>> On Mon, Jun 21, 2021 at 10:34 AM Alessandro Molina <
> >>>> alessandro@ursacomputing.com> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I'd like to share with the ML an idea which me and Nic Crane have
> >> been
> >>>>> experimenting with. It's still in the early stage, but we hope to
> >> turn
> >>> it
> >>>>> into a PR for Arrow documentation soon.
> >>>>>
> >>>>> The idea is to work on a Cookbook, a collection of ready made
> >> recipes,
> >>> on
> >>>>> how to use Arrow that both end users and developers of third party
> >>>>> libraries can refer to when they need to look up "the arrow way" of
> >>> doing
> >>>>> something.
> >>>>>
> >>>>> While the arrow documentation reports all features and functions that
> >>> are
> >>>>> available in arrow, it's not always obvious how to best combine them
> >>> for a
> >>>>> new user. Sometimes the solution ends up being more complicated than
> >>>>> necessary or performs badly due to not obvious side effects like
> >>> unexpected
> >>>>> memory copies etc.
> >>>>>
> >>>>> For this reason we thought about starting a documentation that users
> >>> can
> >>>>> refer to on how to combine arrow features to achieve the results they
> >>> care
> >>>>> about.
> >>>>>
> >>>>> We wrote a short document explaining the idea at
> >>>>>
> >>>
> >>
> https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?usp=sharing
> >>>>>
> >>>>> The core idea behind the cookbook is that all recipes should be
> >>> testable,
> >>>>> so it should be possible to add a CI phase for the cookbook that
> >>> verifies
> >>>>> that all the recipes still work with the current version of Arrow and
> >>> lead
> >>>>> to the expected results.
> >>>>>
> >>>>> At the moment we started it in a separate repository (
> >>>>> https://github.com/ursacomputing/arrow-cookbook ), but we are yet
> >>> unsure
> >>>>> if it should live inside arrow/docs or its own directory (IE:
> >>>>> arrow/cookbook) or its own repository. In the end it's fairly
> >> decoupled
> >>>>> from the rest of Arrow and the documentation, which would have the
> >>> benefit
> >>>>> of allowing a dedicated release cycle every time new recipes are
> >> added
> >>> (at
> >>>>> least in the early phase).
> >>>>>
> >>>>> We are also looking for more ideas about recipes that would be good
> >>>>> candidates for inclusion, so if any of you has thoughts about which
> >>> recipes
> >>>>> we should add please feel free to comment on the document or reply by
> >>> mail
> >>>>> suggesting more recipes.
> >>>>>
> >>>>> Any suggestion for improvements is appreciated! We hope to have
> >>> something
> >>>>> we can release with the next Arrow release.
> >>>>>
> >>>
> >>
> >
>


-- 
Nic Crane
_______________________
@nic_crane <https://twitter.com/nic_crane>
https://thisisnic.github.io/

Re: Apache Arrow Cookbook

Posted by Antoine Pitrou <an...@python.org>.
Hi Rares,

Documentation bugs and improvement requests are welcome, feel free to 
file them on the JIRA!

Regards

Antoine.


Le 08/07/2021 à 01:45, Rares Vernica a écrit :
> Awesome! We would find C++ versions of these recipes very useful. From our
> experience the C++ API is much much harder to deal with and error prone
> than the R/Python one.
> 
> Cheers,
> Rares
> 
> On Wed, Jul 7, 2021 at 9:07 AM Alessandro Molina <
> alessandro@ursacomputing.com> wrote:
> 
>> Yes, that was mostly what I meant when I wrote that the next step is
>> opening a PR against the apache/arrow repository itself :D
>> We moved forward in a separate repository initially to be able to cycle
>> more quickly, but we reached a point where we think we can start
>> integrating the cookbook with the Arrow documentation itself.
>>
>> If instead it's preferred to move forward the effort into its own separated
>> repository (apache/arrow-cookbook) that's an option too, we are open to
>> suggestions from the community.
>>
>> On Wed, Jul 7, 2021 at 5:57 PM Wes McKinney <we...@gmail.com> wrote:
>>
>>> What do you think about developing this cookbook in an Apache Arrow
>>> repository (it could be something like apache/arrow-cookbook, if not
>>> part of the main development repo)? Creating expanded documentation
>>> resources for learning how to use Apache Arrow to solve problems seems
>>> certainly within the bounds of the community's objectives.
>>>
>>> On Wed, Jul 7, 2021 at 5:52 PM Alessandro Molina
>>> <al...@ursacomputing.com> wrote:
>>>>
>>>> We finally have a first preview of the cookbook available for R and
>>> Python,
>>>> for anyone interested the two versions are visible at
>>>> http://ursacomputing.com/arrow-cookbook/py/index.html and
>>>> http://ursacomputing.com/arrow-cookbook/r/index.html
>>>> A new version of the cookbook is automatically published on each new
>>> recipe.
>>>>
>>>> After gathering feedback from interested parties and users, our plan
>> for
>>>> the next step would be to open a PR against the arrow repository and
>>>> automate publishing the cookbook via github actions.
>>>>
>>>> At the moment the recipes implemented are nearly half of those that
>> were
>>>> identified in the dedicated Google Docs (
>>>>
>>>
>> https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?ts=60c73189#heading=h.m7fas2talgy5
>>>> ) so if you have recipes to suggest feel free to leave comments on that
>>>> document or suggest edits.
>>>>
>>>>
>>>> On Mon, Jun 21, 2021 at 10:34 AM Alessandro Molina <
>>>> alessandro@ursacomputing.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'd like to share with the ML an idea which me and Nic Crane have
>> been
>>>>> experimenting with. It's still in the early stage, but we hope to
>> turn
>>> it
>>>>> into a PR for Arrow documentation soon.
>>>>>
>>>>> The idea is to work on a Cookbook, a collection of ready made
>> recipes,
>>> on
>>>>> how to use Arrow that both end users and developers of third party
>>>>> libraries can refer to when they need to look up "the arrow way" of
>>> doing
>>>>> something.
>>>>>
>>>>> While the arrow documentation reports all features and functions that
>>> are
>>>>> available in arrow, it's not always obvious how to best combine them
>>> for a
>>>>> new user. Sometimes the solution ends up being more complicated than
>>>>> necessary or performs badly due to not obvious side effects like
>>> unexpected
>>>>> memory copies etc.
>>>>>
>>>>> For this reason we thought about starting a documentation that users
>>> can
>>>>> refer to on how to combine arrow features to achieve the results they
>>> care
>>>>> about.
>>>>>
>>>>> We wrote a short document explaining the idea at
>>>>>
>>>
>> https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?usp=sharing
>>>>>
>>>>> The core idea behind the cookbook is that all recipes should be
>>> testable,
>>>>> so it should be possible to add a CI phase for the cookbook that
>>> verifies
>>>>> that all the recipes still work with the current version of Arrow and
>>> lead
>>>>> to the expected results.
>>>>>
>>>>> At the moment we started it in a separate repository (
>>>>> https://github.com/ursacomputing/arrow-cookbook ), but we are yet
>>> unsure
>>>>> if it should live inside arrow/docs or its own directory (IE:
>>>>> arrow/cookbook) or its own repository. In the end it's fairly
>> decoupled
>>>>> from the rest of Arrow and the documentation, which would have the
>>> benefit
>>>>> of allowing a dedicated release cycle every time new recipes are
>> added
>>> (at
>>>>> least in the early phase).
>>>>>
>>>>> We are also looking for more ideas about recipes that would be good
>>>>> candidates for inclusion, so if any of you has thoughts about which
>>> recipes
>>>>> we should add please feel free to comment on the document or reply by
>>> mail
>>>>> suggesting more recipes.
>>>>>
>>>>> Any suggestion for improvements is appreciated! We hope to have
>>> something
>>>>> we can release with the next Arrow release.
>>>>>
>>>
>>
> 

Re: Apache Arrow Cookbook

Posted by Matt Topol <zo...@gmail.com>.
Personally I'd love to see a cookbook where a recipe is accompanied by
examples of how to accomplish it in multiple languages rather than having
separate cookbooks for each language.

Though that may just be me wanting to see more love for the Golang
implementation....

On Wed, Jul 7, 2021, 8:57 PM Eduardo Ponce <ed...@gmail.com> wrote:

> Here is additional food for thought.
> The cookbook currently contains examples for C++, R, and Python. Is there a
> plan (or wish) to eventually extend a single cookbook to include examples
> from other languages (eg., Rust, Java)?
> If so, then putting the cookbook into its own (language agnostic) repo
> would make more sense.
> On the other hand, if the cookbook is to be limited in Arrow languages,
> then what would happen if a Rust cookbook is created? Would it be placed in
> the arrow-rs repo or as a standalone arrow/cookbook-rs repo?
>
> ~Eduardo
>
> On Wed, Jul 7, 2021 at 8:09 PM Eduardo Ponce <ed...@gmail.com> wrote:
>
> > Great work!
> > I would recommend having the cookbook in its own repo so that its updates
> > are not constrained by the timeline used for updating the public Arrow
> > documentation.
> > This will allow users that are not involved in Arrow development to
> > contribute or provide suggestions to the cookbook fairly easily (no need
> to
> > download Arrow and build doc).
> > Also, the repo could be used to provide example programs for some of the
> > cookbook's recipes and even have a place for users to share their own
> > examples.
> >
> > ~Eduardo
> >
> > On Wed, Jul 7, 2021 at 7:45 PM Rares Vernica <rv...@gmail.com> wrote:
> >
> >> Awesome! We would find C++ versions of these recipes very useful. From
> our
> >> experience the C++ API is much much harder to deal with and error prone
> >> than the R/Python one.
> >>
> >> Cheers,
> >> Rares
> >>
> >> On Wed, Jul 7, 2021 at 9:07 AM Alessandro Molina <
> >> alessandro@ursacomputing.com> wrote:
> >>
> >> > Yes, that was mostly what I meant when I wrote that the next step is
> >> > opening a PR against the apache/arrow repository itself :D
> >> > We moved forward in a separate repository initially to be able to
> cycle
> >> > more quickly, but we reached a point where we think we can start
> >> > integrating the cookbook with the Arrow documentation itself.
> >> >
> >> > If instead it's preferred to move forward the effort into its own
> >> separated
> >> > repository (apache/arrow-cookbook) that's an option too, we are open
> to
> >> > suggestions from the community.
> >> >
> >> > On Wed, Jul 7, 2021 at 5:57 PM Wes McKinney <we...@gmail.com>
> >> wrote:
> >> >
> >> > > What do you think about developing this cookbook in an Apache Arrow
> >> > > repository (it could be something like apache/arrow-cookbook, if not
> >> > > part of the main development repo)? Creating expanded documentation
> >> > > resources for learning how to use Apache Arrow to solve problems
> seems
> >> > > certainly within the bounds of the community's objectives.
> >> > >
> >> > > On Wed, Jul 7, 2021 at 5:52 PM Alessandro Molina
> >> > > <al...@ursacomputing.com> wrote:
> >> > > >
> >> > > > We finally have a first preview of the cookbook available for R
> and
> >> > > Python,
> >> > > > for anyone interested the two versions are visible at
> >> > > > http://ursacomputing.com/arrow-cookbook/py/index.html and
> >> > > > http://ursacomputing.com/arrow-cookbook/r/index.html
> >> > > > A new version of the cookbook is automatically published on each
> new
> >> > > recipe.
> >> > > >
> >> > > > After gathering feedback from interested parties and users, our
> plan
> >> > for
> >> > > > the next step would be to open a PR against the arrow repository
> and
> >> > > > automate publishing the cookbook via github actions.
> >> > > >
> >> > > > At the moment the recipes implemented are nearly half of those
> that
> >> > were
> >> > > > identified in the dedicated Google Docs (
> >> > > >
> >> > >
> >> >
> >>
> https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?ts=60c73189#heading=h.m7fas2talgy5
> >> > > > ) so if you have recipes to suggest feel free to leave comments on
> >> that
> >> > > > document or suggest edits.
> >> > > >
> >> > > >
> >> > > > On Mon, Jun 21, 2021 at 10:34 AM Alessandro Molina <
> >> > > > alessandro@ursacomputing.com> wrote:
> >> > > >
> >> > > > > Hi,
> >> > > > >
> >> > > > > I'd like to share with the ML an idea which me and Nic Crane
> have
> >> > been
> >> > > > > experimenting with. It's still in the early stage, but we hope
> to
> >> > turn
> >> > > it
> >> > > > > into a PR for Arrow documentation soon.
> >> > > > >
> >> > > > > The idea is to work on a Cookbook, a collection of ready made
> >> > recipes,
> >> > > on
> >> > > > > how to use Arrow that both end users and developers of third
> party
> >> > > > > libraries can refer to when they need to look up "the arrow way"
> >> of
> >> > > doing
> >> > > > > something.
> >> > > > >
> >> > > > > While the arrow documentation reports all features and functions
> >> that
> >> > > are
> >> > > > > available in arrow, it's not always obvious how to best combine
> >> them
> >> > > for a
> >> > > > > new user. Sometimes the solution ends up being more complicated
> >> than
> >> > > > > necessary or performs badly due to not obvious side effects like
> >> > > unexpected
> >> > > > > memory copies etc.
> >> > > > >
> >> > > > > For this reason we thought about starting a documentation that
> >> users
> >> > > can
> >> > > > > refer to on how to combine arrow features to achieve the results
> >> they
> >> > > care
> >> > > > > about.
> >> > > > >
> >> > > > > We wrote a short document explaining the idea at
> >> > > > >
> >> > >
> >> >
> >>
> https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?usp=sharing
> >> > > > >
> >> > > > > The core idea behind the cookbook is that all recipes should be
> >> > > testable,
> >> > > > > so it should be possible to add a CI phase for the cookbook that
> >> > > verifies
> >> > > > > that all the recipes still work with the current version of
> Arrow
> >> and
> >> > > lead
> >> > > > > to the expected results.
> >> > > > >
> >> > > > > At the moment we started it in a separate repository (
> >> > > > > https://github.com/ursacomputing/arrow-cookbook ), but we are
> yet
> >> > > unsure
> >> > > > > if it should live inside arrow/docs or its own directory (IE:
> >> > > > > arrow/cookbook) or its own repository. In the end it's fairly
> >> > decoupled
> >> > > > > from the rest of Arrow and the documentation, which would have
> the
> >> > > benefit
> >> > > > > of allowing a dedicated release cycle every time new recipes are
> >> > added
> >> > > (at
> >> > > > > least in the early phase).
> >> > > > >
> >> > > > > We are also looking for more ideas about recipes that would be
> >> good
> >> > > > > candidates for inclusion, so if any of you has thoughts about
> >> which
> >> > > recipes
> >> > > > > we should add please feel free to comment on the document or
> >> reply by
> >> > > mail
> >> > > > > suggesting more recipes.
> >> > > > >
> >> > > > > Any suggestion for improvements is appreciated! We hope to have
> >> > > something
> >> > > > > we can release with the next Arrow release.
> >> > > > >
> >> > >
> >> >
> >>
> >
>

Re: Apache Arrow Cookbook

Posted by Eduardo Ponce <ed...@gmail.com>.
Here is additional food for thought.
The cookbook currently contains examples for C++, R, and Python. Is there a
plan (or wish) to eventually extend a single cookbook to include examples
from other languages (eg., Rust, Java)?
If so, then putting the cookbook into its own (language agnostic) repo
would make more sense.
On the other hand, if the cookbook is to be limited in Arrow languages,
then what would happen if a Rust cookbook is created? Would it be placed in
the arrow-rs repo or as a standalone arrow/cookbook-rs repo?

~Eduardo

On Wed, Jul 7, 2021 at 8:09 PM Eduardo Ponce <ed...@gmail.com> wrote:

> Great work!
> I would recommend having the cookbook in its own repo so that its updates
> are not constrained by the timeline used for updating the public Arrow
> documentation.
> This will allow users that are not involved in Arrow development to
> contribute or provide suggestions to the cookbook fairly easily (no need to
> download Arrow and build doc).
> Also, the repo could be used to provide example programs for some of the
> cookbook's recipes and even have a place for users to share their own
> examples.
>
> ~Eduardo
>
> On Wed, Jul 7, 2021 at 7:45 PM Rares Vernica <rv...@gmail.com> wrote:
>
>> Awesome! We would find C++ versions of these recipes very useful. From our
>> experience the C++ API is much much harder to deal with and error prone
>> than the R/Python one.
>>
>> Cheers,
>> Rares
>>
>> On Wed, Jul 7, 2021 at 9:07 AM Alessandro Molina <
>> alessandro@ursacomputing.com> wrote:
>>
>> > Yes, that was mostly what I meant when I wrote that the next step is
>> > opening a PR against the apache/arrow repository itself :D
>> > We moved forward in a separate repository initially to be able to cycle
>> > more quickly, but we reached a point where we think we can start
>> > integrating the cookbook with the Arrow documentation itself.
>> >
>> > If instead it's preferred to move forward the effort into its own
>> separated
>> > repository (apache/arrow-cookbook) that's an option too, we are open to
>> > suggestions from the community.
>> >
>> > On Wed, Jul 7, 2021 at 5:57 PM Wes McKinney <we...@gmail.com>
>> wrote:
>> >
>> > > What do you think about developing this cookbook in an Apache Arrow
>> > > repository (it could be something like apache/arrow-cookbook, if not
>> > > part of the main development repo)? Creating expanded documentation
>> > > resources for learning how to use Apache Arrow to solve problems seems
>> > > certainly within the bounds of the community's objectives.
>> > >
>> > > On Wed, Jul 7, 2021 at 5:52 PM Alessandro Molina
>> > > <al...@ursacomputing.com> wrote:
>> > > >
>> > > > We finally have a first preview of the cookbook available for R and
>> > > Python,
>> > > > for anyone interested the two versions are visible at
>> > > > http://ursacomputing.com/arrow-cookbook/py/index.html and
>> > > > http://ursacomputing.com/arrow-cookbook/r/index.html
>> > > > A new version of the cookbook is automatically published on each new
>> > > recipe.
>> > > >
>> > > > After gathering feedback from interested parties and users, our plan
>> > for
>> > > > the next step would be to open a PR against the arrow repository and
>> > > > automate publishing the cookbook via github actions.
>> > > >
>> > > > At the moment the recipes implemented are nearly half of those that
>> > were
>> > > > identified in the dedicated Google Docs (
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?ts=60c73189#heading=h.m7fas2talgy5
>> > > > ) so if you have recipes to suggest feel free to leave comments on
>> that
>> > > > document or suggest edits.
>> > > >
>> > > >
>> > > > On Mon, Jun 21, 2021 at 10:34 AM Alessandro Molina <
>> > > > alessandro@ursacomputing.com> wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > I'd like to share with the ML an idea which me and Nic Crane have
>> > been
>> > > > > experimenting with. It's still in the early stage, but we hope to
>> > turn
>> > > it
>> > > > > into a PR for Arrow documentation soon.
>> > > > >
>> > > > > The idea is to work on a Cookbook, a collection of ready made
>> > recipes,
>> > > on
>> > > > > how to use Arrow that both end users and developers of third party
>> > > > > libraries can refer to when they need to look up "the arrow way"
>> of
>> > > doing
>> > > > > something.
>> > > > >
>> > > > > While the arrow documentation reports all features and functions
>> that
>> > > are
>> > > > > available in arrow, it's not always obvious how to best combine
>> them
>> > > for a
>> > > > > new user. Sometimes the solution ends up being more complicated
>> than
>> > > > > necessary or performs badly due to not obvious side effects like
>> > > unexpected
>> > > > > memory copies etc.
>> > > > >
>> > > > > For this reason we thought about starting a documentation that
>> users
>> > > can
>> > > > > refer to on how to combine arrow features to achieve the results
>> they
>> > > care
>> > > > > about.
>> > > > >
>> > > > > We wrote a short document explaining the idea at
>> > > > >
>> > >
>> >
>> https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?usp=sharing
>> > > > >
>> > > > > The core idea behind the cookbook is that all recipes should be
>> > > testable,
>> > > > > so it should be possible to add a CI phase for the cookbook that
>> > > verifies
>> > > > > that all the recipes still work with the current version of Arrow
>> and
>> > > lead
>> > > > > to the expected results.
>> > > > >
>> > > > > At the moment we started it in a separate repository (
>> > > > > https://github.com/ursacomputing/arrow-cookbook ), but we are yet
>> > > unsure
>> > > > > if it should live inside arrow/docs or its own directory (IE:
>> > > > > arrow/cookbook) or its own repository. In the end it's fairly
>> > decoupled
>> > > > > from the rest of Arrow and the documentation, which would have the
>> > > benefit
>> > > > > of allowing a dedicated release cycle every time new recipes are
>> > added
>> > > (at
>> > > > > least in the early phase).
>> > > > >
>> > > > > We are also looking for more ideas about recipes that would be
>> good
>> > > > > candidates for inclusion, so if any of you has thoughts about
>> which
>> > > recipes
>> > > > > we should add please feel free to comment on the document or
>> reply by
>> > > mail
>> > > > > suggesting more recipes.
>> > > > >
>> > > > > Any suggestion for improvements is appreciated! We hope to have
>> > > something
>> > > > > we can release with the next Arrow release.
>> > > > >
>> > >
>> >
>>
>

Re: Apache Arrow Cookbook

Posted by Eduardo Ponce <ed...@gmail.com>.
Great work!
I would recommend having the cookbook in its own repo so that its updates
are not constrained by the timeline used for updating the public Arrow
documentation.
This will allow users that are not involved in Arrow development to
contribute or provide suggestions to the cookbook fairly easily (no need to
download Arrow and build doc).
Also, the repo could be used to provide example programs for some of the
cookbook's recipes and even have a place for users to share their own
examples.

~Eduardo

On Wed, Jul 7, 2021 at 7:45 PM Rares Vernica <rv...@gmail.com> wrote:

> Awesome! We would find C++ versions of these recipes very useful. From our
> experience the C++ API is much much harder to deal with and error prone
> than the R/Python one.
>
> Cheers,
> Rares
>
> On Wed, Jul 7, 2021 at 9:07 AM Alessandro Molina <
> alessandro@ursacomputing.com> wrote:
>
> > Yes, that was mostly what I meant when I wrote that the next step is
> > opening a PR against the apache/arrow repository itself :D
> > We moved forward in a separate repository initially to be able to cycle
> > more quickly, but we reached a point where we think we can start
> > integrating the cookbook with the Arrow documentation itself.
> >
> > If instead it's preferred to move forward the effort into its own
> separated
> > repository (apache/arrow-cookbook) that's an option too, we are open to
> > suggestions from the community.
> >
> > On Wed, Jul 7, 2021 at 5:57 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > > What do you think about developing this cookbook in an Apache Arrow
> > > repository (it could be something like apache/arrow-cookbook, if not
> > > part of the main development repo)? Creating expanded documentation
> > > resources for learning how to use Apache Arrow to solve problems seems
> > > certainly within the bounds of the community's objectives.
> > >
> > > On Wed, Jul 7, 2021 at 5:52 PM Alessandro Molina
> > > <al...@ursacomputing.com> wrote:
> > > >
> > > > We finally have a first preview of the cookbook available for R and
> > > Python,
> > > > for anyone interested the two versions are visible at
> > > > http://ursacomputing.com/arrow-cookbook/py/index.html and
> > > > http://ursacomputing.com/arrow-cookbook/r/index.html
> > > > A new version of the cookbook is automatically published on each new
> > > recipe.
> > > >
> > > > After gathering feedback from interested parties and users, our plan
> > for
> > > > the next step would be to open a PR against the arrow repository and
> > > > automate publishing the cookbook via github actions.
> > > >
> > > > At the moment the recipes implemented are nearly half of those that
> > were
> > > > identified in the dedicated Google Docs (
> > > >
> > >
> >
> https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?ts=60c73189#heading=h.m7fas2talgy5
> > > > ) so if you have recipes to suggest feel free to leave comments on
> that
> > > > document or suggest edits.
> > > >
> > > >
> > > > On Mon, Jun 21, 2021 at 10:34 AM Alessandro Molina <
> > > > alessandro@ursacomputing.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'd like to share with the ML an idea which me and Nic Crane have
> > been
> > > > > experimenting with. It's still in the early stage, but we hope to
> > turn
> > > it
> > > > > into a PR for Arrow documentation soon.
> > > > >
> > > > > The idea is to work on a Cookbook, a collection of ready made
> > recipes,
> > > on
> > > > > how to use Arrow that both end users and developers of third party
> > > > > libraries can refer to when they need to look up "the arrow way" of
> > > doing
> > > > > something.
> > > > >
> > > > > While the arrow documentation reports all features and functions
> that
> > > are
> > > > > available in arrow, it's not always obvious how to best combine
> them
> > > for a
> > > > > new user. Sometimes the solution ends up being more complicated
> than
> > > > > necessary or performs badly due to not obvious side effects like
> > > unexpected
> > > > > memory copies etc.
> > > > >
> > > > > For this reason we thought about starting a documentation that
> users
> > > can
> > > > > refer to on how to combine arrow features to achieve the results
> they
> > > care
> > > > > about.
> > > > >
> > > > > We wrote a short document explaining the idea at
> > > > >
> > >
> >
> https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?usp=sharing
> > > > >
> > > > > The core idea behind the cookbook is that all recipes should be
> > > testable,
> > > > > so it should be possible to add a CI phase for the cookbook that
> > > verifies
> > > > > that all the recipes still work with the current version of Arrow
> and
> > > lead
> > > > > to the expected results.
> > > > >
> > > > > At the moment we started it in a separate repository (
> > > > > https://github.com/ursacomputing/arrow-cookbook ), but we are yet
> > > unsure
> > > > > if it should live inside arrow/docs or its own directory (IE:
> > > > > arrow/cookbook) or its own repository. In the end it's fairly
> > decoupled
> > > > > from the rest of Arrow and the documentation, which would have the
> > > benefit
> > > > > of allowing a dedicated release cycle every time new recipes are
> > added
> > > (at
> > > > > least in the early phase).
> > > > >
> > > > > We are also looking for more ideas about recipes that would be good
> > > > > candidates for inclusion, so if any of you has thoughts about which
> > > recipes
> > > > > we should add please feel free to comment on the document or reply
> by
> > > mail
> > > > > suggesting more recipes.
> > > > >
> > > > > Any suggestion for improvements is appreciated! We hope to have
> > > something
> > > > > we can release with the next Arrow release.
> > > > >
> > >
> >
>

Re: Apache Arrow Cookbook

Posted by Rares Vernica <rv...@gmail.com>.
Awesome! We would find C++ versions of these recipes very useful. From our
experience the C++ API is much much harder to deal with and error prone
than the R/Python one.

Cheers,
Rares

On Wed, Jul 7, 2021 at 9:07 AM Alessandro Molina <
alessandro@ursacomputing.com> wrote:

> Yes, that was mostly what I meant when I wrote that the next step is
> opening a PR against the apache/arrow repository itself :D
> We moved forward in a separate repository initially to be able to cycle
> more quickly, but we reached a point where we think we can start
> integrating the cookbook with the Arrow documentation itself.
>
> If instead it's preferred to move forward the effort into its own separated
> repository (apache/arrow-cookbook) that's an option too, we are open to
> suggestions from the community.
>
> On Wed, Jul 7, 2021 at 5:57 PM Wes McKinney <we...@gmail.com> wrote:
>
> > What do you think about developing this cookbook in an Apache Arrow
> > repository (it could be something like apache/arrow-cookbook, if not
> > part of the main development repo)? Creating expanded documentation
> > resources for learning how to use Apache Arrow to solve problems seems
> > certainly within the bounds of the community's objectives.
> >
> > On Wed, Jul 7, 2021 at 5:52 PM Alessandro Molina
> > <al...@ursacomputing.com> wrote:
> > >
> > > We finally have a first preview of the cookbook available for R and
> > Python,
> > > for anyone interested the two versions are visible at
> > > http://ursacomputing.com/arrow-cookbook/py/index.html and
> > > http://ursacomputing.com/arrow-cookbook/r/index.html
> > > A new version of the cookbook is automatically published on each new
> > recipe.
> > >
> > > After gathering feedback from interested parties and users, our plan
> for
> > > the next step would be to open a PR against the arrow repository and
> > > automate publishing the cookbook via github actions.
> > >
> > > At the moment the recipes implemented are nearly half of those that
> were
> > > identified in the dedicated Google Docs (
> > >
> >
> https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?ts=60c73189#heading=h.m7fas2talgy5
> > > ) so if you have recipes to suggest feel free to leave comments on that
> > > document or suggest edits.
> > >
> > >
> > > On Mon, Jun 21, 2021 at 10:34 AM Alessandro Molina <
> > > alessandro@ursacomputing.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > I'd like to share with the ML an idea which me and Nic Crane have
> been
> > > > experimenting with. It's still in the early stage, but we hope to
> turn
> > it
> > > > into a PR for Arrow documentation soon.
> > > >
> > > > The idea is to work on a Cookbook, a collection of ready made
> recipes,
> > on
> > > > how to use Arrow that both end users and developers of third party
> > > > libraries can refer to when they need to look up "the arrow way" of
> > doing
> > > > something.
> > > >
> > > > While the arrow documentation reports all features and functions that
> > are
> > > > available in arrow, it's not always obvious how to best combine them
> > for a
> > > > new user. Sometimes the solution ends up being more complicated than
> > > > necessary or performs badly due to not obvious side effects like
> > unexpected
> > > > memory copies etc.
> > > >
> > > > For this reason we thought about starting a documentation that users
> > can
> > > > refer to on how to combine arrow features to achieve the results they
> > care
> > > > about.
> > > >
> > > > We wrote a short document explaining the idea at
> > > >
> >
> https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?usp=sharing
> > > >
> > > > The core idea behind the cookbook is that all recipes should be
> > testable,
> > > > so it should be possible to add a CI phase for the cookbook that
> > verifies
> > > > that all the recipes still work with the current version of Arrow and
> > lead
> > > > to the expected results.
> > > >
> > > > At the moment we started it in a separate repository (
> > > > https://github.com/ursacomputing/arrow-cookbook ), but we are yet
> > unsure
> > > > if it should live inside arrow/docs or its own directory (IE:
> > > > arrow/cookbook) or its own repository. In the end it's fairly
> decoupled
> > > > from the rest of Arrow and the documentation, which would have the
> > benefit
> > > > of allowing a dedicated release cycle every time new recipes are
> added
> > (at
> > > > least in the early phase).
> > > >
> > > > We are also looking for more ideas about recipes that would be good
> > > > candidates for inclusion, so if any of you has thoughts about which
> > recipes
> > > > we should add please feel free to comment on the document or reply by
> > mail
> > > > suggesting more recipes.
> > > >
> > > > Any suggestion for improvements is appreciated! We hope to have
> > something
> > > > we can release with the next Arrow release.
> > > >
> >
>

Re: Apache Arrow Cookbook

Posted by Alessandro Molina <al...@ursacomputing.com>.
Yes, that was mostly what I meant when I wrote that the next step is
opening a PR against the apache/arrow repository itself :D
We moved forward in a separate repository initially to be able to cycle
more quickly, but we reached a point where we think we can start
integrating the cookbook with the Arrow documentation itself.

If instead it's preferred to move forward the effort into its own separated
repository (apache/arrow-cookbook) that's an option too, we are open to
suggestions from the community.

On Wed, Jul 7, 2021 at 5:57 PM Wes McKinney <we...@gmail.com> wrote:

> What do you think about developing this cookbook in an Apache Arrow
> repository (it could be something like apache/arrow-cookbook, if not
> part of the main development repo)? Creating expanded documentation
> resources for learning how to use Apache Arrow to solve problems seems
> certainly within the bounds of the community's objectives.
>
> On Wed, Jul 7, 2021 at 5:52 PM Alessandro Molina
> <al...@ursacomputing.com> wrote:
> >
> > We finally have a first preview of the cookbook available for R and
> Python,
> > for anyone interested the two versions are visible at
> > http://ursacomputing.com/arrow-cookbook/py/index.html and
> > http://ursacomputing.com/arrow-cookbook/r/index.html
> > A new version of the cookbook is automatically published on each new
> recipe.
> >
> > After gathering feedback from interested parties and users, our plan for
> > the next step would be to open a PR against the arrow repository and
> > automate publishing the cookbook via github actions.
> >
> > At the moment the recipes implemented are nearly half of those that were
> > identified in the dedicated Google Docs (
> >
> https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?ts=60c73189#heading=h.m7fas2talgy5
> > ) so if you have recipes to suggest feel free to leave comments on that
> > document or suggest edits.
> >
> >
> > On Mon, Jun 21, 2021 at 10:34 AM Alessandro Molina <
> > alessandro@ursacomputing.com> wrote:
> >
> > > Hi,
> > >
> > > I'd like to share with the ML an idea which me and Nic Crane have been
> > > experimenting with. It's still in the early stage, but we hope to turn
> it
> > > into a PR for Arrow documentation soon.
> > >
> > > The idea is to work on a Cookbook, a collection of ready made recipes,
> on
> > > how to use Arrow that both end users and developers of third party
> > > libraries can refer to when they need to look up "the arrow way" of
> doing
> > > something.
> > >
> > > While the arrow documentation reports all features and functions that
> are
> > > available in arrow, it's not always obvious how to best combine them
> for a
> > > new user. Sometimes the solution ends up being more complicated than
> > > necessary or performs badly due to not obvious side effects like
> unexpected
> > > memory copies etc.
> > >
> > > For this reason we thought about starting a documentation that users
> can
> > > refer to on how to combine arrow features to achieve the results they
> care
> > > about.
> > >
> > > We wrote a short document explaining the idea at
> > >
> https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?usp=sharing
> > >
> > > The core idea behind the cookbook is that all recipes should be
> testable,
> > > so it should be possible to add a CI phase for the cookbook that
> verifies
> > > that all the recipes still work with the current version of Arrow and
> lead
> > > to the expected results.
> > >
> > > At the moment we started it in a separate repository (
> > > https://github.com/ursacomputing/arrow-cookbook ), but we are yet
> unsure
> > > if it should live inside arrow/docs or its own directory (IE:
> > > arrow/cookbook) or its own repository. In the end it's fairly decoupled
> > > from the rest of Arrow and the documentation, which would have the
> benefit
> > > of allowing a dedicated release cycle every time new recipes are added
> (at
> > > least in the early phase).
> > >
> > > We are also looking for more ideas about recipes that would be good
> > > candidates for inclusion, so if any of you has thoughts about which
> recipes
> > > we should add please feel free to comment on the document or reply by
> mail
> > > suggesting more recipes.
> > >
> > > Any suggestion for improvements is appreciated! We hope to have
> something
> > > we can release with the next Arrow release.
> > >
>

Re: Apache Arrow Cookbook

Posted by Wes McKinney <we...@gmail.com>.
What do you think about developing this cookbook in an Apache Arrow
repository (it could be something like apache/arrow-cookbook, if not
part of the main development repo)? Creating expanded documentation
resources for learning how to use Apache Arrow to solve problems seems
certainly within the bounds of the community's objectives.

On Wed, Jul 7, 2021 at 5:52 PM Alessandro Molina
<al...@ursacomputing.com> wrote:
>
> We finally have a first preview of the cookbook available for R and Python,
> for anyone interested the two versions are visible at
> http://ursacomputing.com/arrow-cookbook/py/index.html and
> http://ursacomputing.com/arrow-cookbook/r/index.html
> A new version of the cookbook is automatically published on each new recipe.
>
> After gathering feedback from interested parties and users, our plan for
> the next step would be to open a PR against the arrow repository and
> automate publishing the cookbook via github actions.
>
> At the moment the recipes implemented are nearly half of those that were
> identified in the dedicated Google Docs (
> https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?ts=60c73189#heading=h.m7fas2talgy5
> ) so if you have recipes to suggest feel free to leave comments on that
> document or suggest edits.
>
>
> On Mon, Jun 21, 2021 at 10:34 AM Alessandro Molina <
> alessandro@ursacomputing.com> wrote:
>
> > Hi,
> >
> > I'd like to share with the ML an idea which me and Nic Crane have been
> > experimenting with. It's still in the early stage, but we hope to turn it
> > into a PR for Arrow documentation soon.
> >
> > The idea is to work on a Cookbook, a collection of ready made recipes, on
> > how to use Arrow that both end users and developers of third party
> > libraries can refer to when they need to look up "the arrow way" of doing
> > something.
> >
> > While the arrow documentation reports all features and functions that are
> > available in arrow, it's not always obvious how to best combine them for a
> > new user. Sometimes the solution ends up being more complicated than
> > necessary or performs badly due to not obvious side effects like unexpected
> > memory copies etc.
> >
> > For this reason we thought about starting a documentation that users can
> > refer to on how to combine arrow features to achieve the results they care
> > about.
> >
> > We wrote a short document explaining the idea at
> > https://docs.google.com/document/d/1v-jK_9osnLvAnAjLOM_frgzakjFhLpUi8OC0MlKpxzw/edit?usp=sharing
> >
> > The core idea behind the cookbook is that all recipes should be testable,
> > so it should be possible to add a CI phase for the cookbook that verifies
> > that all the recipes still work with the current version of Arrow and lead
> > to the expected results.
> >
> > At the moment we started it in a separate repository (
> > https://github.com/ursacomputing/arrow-cookbook ), but we are yet unsure
> > if it should live inside arrow/docs or its own directory (IE:
> > arrow/cookbook) or its own repository. In the end it's fairly decoupled
> > from the rest of Arrow and the documentation, which would have the benefit
> > of allowing a dedicated release cycle every time new recipes are added (at
> > least in the early phase).
> >
> > We are also looking for more ideas about recipes that would be good
> > candidates for inclusion, so if any of you has thoughts about which recipes
> > we should add please feel free to comment on the document or reply by mail
> > suggesting more recipes.
> >
> > Any suggestion for improvements is appreciated! We hope to have something
> > we can release with the next Arrow release.
> >