You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Dominik Moritz <do...@apache.org> on 2021/06/10 17:37:21 UTC

Re: Long title on github page

I thought there were some good suggestions in this thread. @Wes, did you
find a description you liked?

On May 18, 2021 at 06:24:47, Adam Hooper <ad...@adamhooper.com> wrote:

> Poll question: why did you choose Arrow?
>
> Personally: I researched Arrow because it's a spec for IPC. (My requirement
> was: "wrap computations in a separate process.") I chose Arrow for its
> community and ecosystem -- in other words, because my peers chose it.
>
> I happen to use the compute kernel and Parquet capabilities every day; but
> they did not sway me at all. I would choose Arrow if it were nothing but
> this spec and this community. (I chose HTML, after all.)
>
> I see the *code* as one enormous proof that the *spec* is good, and as a
> collection of examples and best practices.
>
> ... so a great pitch to me would be: "Apache Arrow is a data format and
> toolbox for efficient in-memory processing."
>
> Enjoy life,
> Adam
>
> On Tue, May 18, 2021 at 2:38 AM Aldrin <ak...@ucsc.edu.invalid> wrote:
>
> "Apache Arrow is a data processing library that also provides a uniform,
>
> efficient interface for data systems."
>
>
> This probably still isn't quite right, I imagine the bit about "for data
>
> systems" needs some addition (maybe "for transport between data systems")?
>
>
> My primary motivators:
>
>
>    - "A data processing library":
>
>       - Arrow provides many language bindings, but ultimately they're all
>
>       part of the same "library ecosystem", which I think is fine to
>
> capture in
>
>       "library"
>
>       - A main goal of arrow is for processing to be fast, whatever that
>
>       processing may be
>
>       - "uniform, efficient interface for data systems":
>
>       - Arrow, provides (or tries to) a cohesive ("uniform") interface for
>
>       data processing (although it has several APIs to do this)
>
>       - Also, IMO, a motivation for arrow was a format and library to
>
>       facilitate processing, but that provided functions and
>
> interfaces to easily
>
>       translate into optimized data formats used by disparate data systems
>
>       (cassandra, hadoop, etc.).
>
>       - Arrow tries to be transparently zero-copy, which is part of the
>
>       interface for efficiency
>
>    - Arrow certainly has a data format, but that format is the crux of the
>
>    interface (IMO). However, it also makes using other formats easy (via
>
>    filesystem API and parquet reader/writers, etc.). So, focusing on the
>
> data
>
>    format seems unnecessary in such a terse description.
>
>
>
> Aldrin Montana
>
> Computer Science PhD Student
>
> UC Santa Cruz
>
>
>
> On Mon, May 17, 2021 at 5:07 PM Weston Pace <we...@gmail.com> wrote:
>
>
> > I'd avoid the word "structured" as it is somewhat ill-defined.
>
> >
>
> > On Mon, May 17, 2021 at 12:37 PM Mauricio Vargas
>
> > <ma...@ursacomputing.com> wrote:
>
> > >
>
> > > more marketed:
>
> > > How about: "Apache Arrow is a format and language-agnostic library
>
> > focused
>
> > > on efficient sharing and processing of structured data."
>
> > >
>
> > > On Mon, May 17, 2021 at 6:25 PM Micah Kornfield <emkornfield@gmail.com
>
> >
>
> > > wrote:
>
> > >
>
> > > > How about: "Apache Arrow is a collection of specifications, cross
>
> > language
>
> > > > libraries and applications focused on efficient sharing and
>
> processing
>
> > of
>
> > > > structured data."
>
> > > >
>
> > > > On Mon, May 17, 2021 at 3:06 PM Wes McKinney <we...@gmail.com>
>
> > wrote:
>
> > > >
>
> > > > > On Mon, May 17, 2021 at 4:58 PM Weston Pace <weston.pace@gmail.com
>
> >
>
> > > > wrote:
>
> > > > > >
>
> > > > > > > “Apache Arrow is a format and compute kernel for in-memory
>
> data”
>
> > > > > >
>
> > > > > > I like this but no one ever knows what "in-memory" means (or they
>
> > just
>
> > > > > > think 'data is always in memory').  How about...
>
> > > > > >
>
> > > > > > "Apache Arrow is a format and compute kernel for zero-copy
>
> > processing
>
> > > > > > and sharing of data."
>
> > > > > >
>
> > > > > > or...
>
> > > > > >
>
> > > > > > "Apache Arrow is a format and compute kernel for processing and
>
> > > > > > sharing data without serialization overhead."
>
> > > > >
>
> > > > > A few issues with this:
>
> > > > >
>
> > > > > * Multiple PL aspect unclear (is a single piece of software, or
>
> > > > > multiple pieces of software?)
>
> > > > > * Development platform aspect unclear
>
> > > > >
>
> > > > > I see that some people don't like the word "platform". Some people
>
> > > > > come to this project and want to find an end-to-end application,
>
> > > > > rather than a developer toolkit that they can use to build
>
> > > > > applications. Perhaps we should be more explicit and use
>
> > > > > "computational development toolkit" instead of "platform".
>
> > > > >
>
> > > > > > Although marshalling[1] would probably be a more precise word it
>
> is
>
> > > > > > not as well known.
>
> > > > > >
>
> > > > > > [1] https://en.wikipedia.org/wiki/Marshalling_(computer_science)
>
> > > > > >
>
> > > > > > On Mon, May 17, 2021 at 9:36 AM Mauricio Vargas
>
> > > > > > <ma...@ursacomputing.com> wrote:
>
> > > > > > >
>
> > > > > > > a few ideas
>
> > > > > > >
>
> > > > > > > github.com/apache/arrow - Apache Arrow is an efficient library
>
> > for
>
> > > > > big data
>
> > > > > > > processing and sharing
>
> > > > > > >
>
> > > > > > > github.com/apache/arrow - Apache Arrow is a computational tool
>
> > for
>
> > > > > > > processing, storing and sharing large datasets
>
> > > > > > >
>
> > > > > > > github.com/apache/arrow - Apache Arrow is a  fast and simple
>
> > library
>
> > > > > for
>
> > > > > > > big data analytics
>
> > > > > > >
>
> > > > > > > *github.com/apache/arrow <http://github.com/apache/arrow> -
>
> > Apache
>
> > > > > Arrow is
>
> > > > > > > a powerful workhorse for analytic operations on modern
>
> hardware*
>
> > > > > > >
>
> > > > > > >
>
> > > > > > > On Mon, May 17, 2021 at 3:13 PM Julian Hyde <
>
> > jhyde.apache@gmail.com>
>
> > > > > wrote:
>
> > > > > > >
>
> > > > > > > > Alright, well, whatever it is, it must fit into one breath.
>
> If
>
> > the
>
> > > > > > > > high-concept pitch is successful, people will stick around
>
> for
>
> > the
>
> > > > > full
>
> > > > > > > > pitch.
>
> > > > > > > >
>
> > > > > > > > Words such as “platform” and “enable” are noise. You say
>
> > > > “platform”,
>
> > > > > they
>
> > > > > > > > start to say “what exactly do you mean by platform”, the
>
> > elevator
>
> > > > > doors
>
> > > > > > > > open, and they’re gone.
>
> > > > > > > >
>
> > > > > > > > “Apache Arrow is a format and compute kernel for in-memory
>
> > data”
>
> > > > > > > >
>
> > > > > > > >
>
> > > > > > > > > On May 17, 2021, at 12:03 PM, Eduardo Ponce <
>
> > edponce00@gmail.com
>
> > > > >
>
> > > > > wrote:
>
> > > > > > > > >
>
> > > > > > > > > One more suggestion for the bucket:
>
> > > > > > > > > "Apache Arrow is a computational platform for efficient
>
> > in-memory
>
> > > > > data
>
> > > > > > > > > representation and processing."
>
> > > > > > > > >
>
> > > > > > > > > On Mon, May 17, 2021 at 2:49 PM Wes McKinney <
>
> > > > wesmckinn@gmail.com>
>
> > > > > > > > wrote:
>
> > > > > > > > >
>
> > > > > > > > >> I think less is better in the description, but
>
> > unfortunately the
>
> > > > > > > > >> association of Arrow as being "just a data format" has
>
> been
>
> > > > > actively
>
> > > > > > > > >> harmful in some ways to community growth. We have a data
>
> > format,
>
> > > > > yes,
>
> > > > > > > > >> but we are also creating a computational platform to go
>
> > > > > hand-in-hand
>
> > > > > > > > >> with the data format to make it easier to build fast
>
> > > > applications
>
> > > > > that
>
> > > > > > > > >> use the data format. So the description needs to capture
>
> > both of
>
> > > > > these
>
> > > > > > > > >> ideas.
>
> > > > > > > > >>
>
> > > > > > > > >> On Mon, May 17, 2021 at 12:15 PM Julian Hyde <
>
> > > > > jhyde.apache@gmail.com>
>
> > > > > > > > >> wrote:
>
> > > > > > > > >>>
>
> > > > > > > > >>> I think that the “cross-language development platform
>
> for”
>
> > is
>
> > > > > noise.
>
> > > > > > > > >> (I’m sure that JPEG developers think that JPEG is a
>
> > > > > “cross-language
>
> > > > > > > > >> development platform” too. But it isn’t. It is an image
>
> > format.)
>
> > > > > > > > >>>
>
> > > > > > > > >>> "Apache Arrow is data format for efficient in-memory
>
> > > > processing.”
>
> > > > > > > > >>>
>
> > > > > > > > >>> I’ll note that In marketing speak, we are developing a
>
> > > > > high-concept
>
> > > > > > > > >> pitch [1] here. Every company needs a name, a brand, a
>
> > > > > high-concept
>
> > > > > > > > pitch,
>
> > > > > > > > >> and 3- or 4-sentence description. But every Apache project
>
> > needs
>
> > > > > these
>
> > > > > > > > too.
>
> > > > > > > > >> It’s worth spending the time on the description, also, and
>
> > then
>
> > > > > use
>
> > > > > > > > them in
>
> > > > > > > > >> all the places that we describe Arrow.
>
> > > > > > > > >>>
>
> > > > > > > > >>> Julian
>
> > > > > > > > >>>
>
> > > > > > > > >>> [1]
>
> > > > > https://www.growthink.com/content/whats-your-high-concept-pitch
>
> > > > > > > > >>>
>
> > > > > > > > >>>
>
> > > > > > > > >>>
>
> > > > > > > > >>>> On May 17, 2021, at 7:38 AM, Eduardo Ponce <
>
> > > > edponce00@gmail.com
>
> > > > > >
>
> > > > > > > > >> wrote:
>
> > > > > > > > >>>>
>
> > > > > > > > >>>> I agree with Nate's and Brian's suggestions, but would
>
> > like to
>
> > > > > add
>
> > > > > > > > >> that we
>
> > > > > > > > >>>> can make it a one-liner for more conciseness and
>
> > consistency
>
> > > > > with
>
> > > > > > > > other
>
> > > > > > > > >>>> Apache projects.
>
> > > > > > > > >>>> Apologies if it seems I am going around the suggestions
>
> > loop
>
> > > > > again.
>
> > > > > > > > >>>>
>
> > > > > > > > >>>> "Apache Arrow is a cross-language development platform
>
> > > > enabling
>
> > > > > > > > >> efficient
>
> > > > > > > > >>>> in-memory data processing and transport."
>
> > > > > > > > >>>>
>
> > > > > > > > >>>>
>
> > > > > > > > >>>>
>
> > > > > > > > >>>>
>
> > > > > > > > >>>> On Mon, May 17, 2021 at 10:11 AM Brian Hulette <
>
> > > > > bhulette@apache.org>
>
> > > > > > > > >> wrote:
>
> > > > > > > > >>>>
>
> > > > > > > > >>>>> Thank you for bringing this up Dominik. I sampled some
>
> > of the
>
> > > > > > > > >> descriptions
>
> > > > > > > > >>>>> for other Apache projects I frequent, the ones with a
>
> > > > > meaningful
>
> > > > > > > > >>>>> description have a single sentence:
>
> > > > > > > > >>>>>
>
> > > > > > > > >>>>> github.com/apache/spark - Apache Spark - A unified
>
> > analytics
>
> > > > > engine
>
> > > > > > > > >> for
>
> > > > > > > > >>>>> large-scale data processing
>
> > > > > > > > >>>>> github.com/apache/beam - Apache Beam is a unified
>
> > > > programming
>
> > > > > model
>
> > > > > > > > >> for
>
> > > > > > > > >>>>> Batch and Streaming
>
> > > > > > > > >>>>> github.com/apache/avro - Apache Avro is a data
>
> > serialization
>
> > > > > system
>
> > > > > > > > >>>>>
>
> > > > > > > > >>>>> Several others (Flink, Hadoop, ...) just have  "[Mirror
>
> > of]
>
> > > > > Apache
>
> > > > > > > > >> <name>"
>
> > > > > > > > >>>>> as the description.
>
> > > > > > > > >>>>>
>
> > > > > > > > >>>>> +1 for Nate's suggestion "Apache Arrow is a
>
> > cross-language
>
> > > > > > > > development
>
> > > > > > > > >>>>> platform for in-memory data. It enables systems to
>
> > process
>
> > > > and
>
> > > > > > > > >> transport
>
> > > > > > > > >>>>> data more efficiently."
>
> > > > > > > > >>>>>
>
> > > > > > > > >>>>> On Mon, May 17, 2021 at 5:23 AM Wes McKinney <
>
> > > > > wesmckinn@gmail.com>
>
> > > > > > > > >> wrote:
>
> > > > > > > > >>>>>
>
> > > > > > > > >>>>>> It's probably best for description to limit mentions
>
> of
>
> > > > > specific
>
> > > > > > > > >>>>>> features. There are some high level features mentioned
>
> > in
>
> > > > the
>
> > > > > > > > >>>>>> description now ("computational libraries and
>
> zero-copy
>
> > > > > streaming
>
> > > > > > > > >>>>>> messaging and interprocess communication"), but now in
>
> > 2021
>
> > > > > since
>
> > > > > > > > the
>
> > > > > > > > >>>>>> project has grown so much, it could leave people with
>
> a
>
> > > > > limited view
>
> > > > > > > > >>>>>> of what they might find here.
>
> > > > > > > > >>>>>>
>
> > > > > > > > >>>>>> On Mon, May 17, 2021 at 12:14 AM Mauricio Vargas
>
> > > > > > > > >>>>>> <ma...@ursacomputing.com> wrote:
>
> > > > > > > > >>>>>>>
>
> > > > > > > > >>>>>>> How about
>
> > > > > > > > >>>>>>> 'Apache Arrow is a cross-language development
>
> platform
>
> > for
>
> > > > > > > > in-memory
>
> > > > > > > > >>>>>> data.
>
> > > > > > > > >>>>>>> It enables systems to process and transport data
>
> > > > efficiently,
>
> > > > > > > > >>>>> providing a
>
> > > > > > > > >>>>>>> simple and fast library for partitioning of large
>
> > tables'?
>
> > > > > > > > >>>>>>>
>
> > > > > > > > >>>>>>> Sorry the delay, long election day
>
> > > > > > > > >>>>>>>
>
> > > > > > > > >>>>>>> On Sun, May 16, 2021, 2:27 PM Nate Bauernfeind <
>
> > > > > > > > >>>>>> natebauernfeind@deephaven.io>
>
> > > > > > > > >>>>>>> wrote:
>
> > > > > > > > >>>>>>>
>
> > > > > > > > >>>>>>>> Suggestion: faster -> more efficiently
>
> > > > > > > > >>>>>>>>
>
> > > > > > > > >>>>>>>> "Apache Arrow is a cross-language development
>
> > platform for
>
> > > > > > > > >> in-memory
>
> > > > > > > > >>>>>>>> data. It enables systems to process and transport
>
> data
>
> > > > more
>
> > > > > > > > >>>>>> efficiently."
>
> > > > > > > > >>>>>>>>
>
> > > > > > > > >>>>>>>> On Sun, May 16, 2021 at 11:35 AM Wes McKinney <
>
> > > > > > > > wesmckinn@gmail.com
>
> > > > > > > > >>>
>
> > > > > > > > >>>>>> wrote:
>
> > > > > > > > >>>>>>>>
>
> > > > > > > > >>>>>>>>> Here's what there now:
>
> > > > > > > > >>>>>>>>>
>
> > > > > > > > >>>>>>>>> "Apache Arrow is a cross-language development
>
> > platform
>
> > > > for
>
> > > > > > > > >>>>> in-memory
>
> > > > > > > > >>>>>>>>> data. It specifies a standardized
>
> > language-independent
>
> > > > > columnar
>
> > > > > > > > >>>>>> memory
>
> > > > > > > > >>>>>>>>> format for flat and hierarchical data, organized
>
> for
>
> > > > > efficient
>
> > > > > > > > >>>>>>>>> analytic operations on modern hardware. It also
>
> > provides
>
> > > > > > > > >>>>>> computational
>
> > > > > > > > >>>>>>>>> libraries and zero-copy streaming messaging and
>
> > > > > interprocess
>
> > > > > > > > >>>>>>>>> communication…"
>
> > > > > > > > >>>>>>>>>
>
> > > > > > > > >>>>>>>>> How about something shorter like
>
> > > > > > > > >>>>>>>>>
>
> > > > > > > > >>>>>>>>> "Apache Arrow is a cross-language development
>
> > platform
>
> > > > for
>
> > > > > > > > >>>>> in-memory
>
> > > > > > > > >>>>>>>>> data. It enables systems to process and transport
>
> > data
>
> > > > > faster."
>
> > > > > > > > >>>>>>>>>
>
> > > > > > > > >>>>>>>>> Suggestions / refinements from others welcome
>
> > > > > > > > >>>>>>>>>
>
> > > > > > > > >>>>>>>>>
>
> > > > > > > > >>>>>>>>> On Sat, May 15, 2021 at 9:12 PM Dominik Moritz <
>
> > > > > domoritz@cmu.edu
>
> > > > > > > > >
>
> > > > > > > > >>>>>> wrote:
>
> > > > > > > > >>>>>>>>>>
>
> > > > > > > > >>>>>>>>>> Super minor issue but could someone make the
>
> > description
>
> > > > > on
>
> > > > > > > > >>>>> GitHub
>
> > > > > > > > >>>>>>>>> shorter?
>
> > > > > > > > >>>>>>>>>>
>
> > > > > > > > >>>>>>>>>>
>
> > > > > > > > >>>>>>>>>>
>
> > > > > > > > >>>>>>>>>> GitHub puts the description into the title of the
>
> > page
>
> > > > > and makes
>
> > > > > > > > >>>>> it
>
> > > > > > > > >>>>>>>> hard
>
> > > > > > > > >>>>>>>>> to find it in URL autocomplete.
>
> > > > > > > > >>>>>>>>>>
>
> > > > > > > > >>>>>>>>>
>
> > > > > > > > >>>>>>>>
>
> > > > > > > > >>>>>>>>
>
> > > > > > > > >>>>>>>> --
>
> > > > > > > > >>>>>>>>
>
> > > > > > > > >>>>>>
>
> > > > > > > > >>>>>
>
> > > > > > > > >>>
>
> > > > > > > > >>
>
> > > > > > > >
>
> > > > > > > >
>
> > > > >
>
> > > >
>
> >
>
>
>
>
> --
> Adam Hooper
> +1-514-882-9694
> http://adamhooper.com
>

Re: Long title on github page

Posted by Ian Cook <ia...@ursacomputing.com>.
Reopening this old thread to discuss whether we should change the
heading text on the Arrow website (https://arrow.apache.org) to match
this updated description in the GitHub repo.

I opened a Jira issue for this at
https://issues.apache.org/jira/browse/ARROW-14086. Please share
feedback here or in comments on the Jira issue.

On Sat, Jun 12, 2021 at 11:20 AM Joris Peeters
<jo...@gmail.com> wrote:
>
> +1
>
> On Sat, Jun 12, 2021 at 2:56 PM Wes McKinney <we...@gmail.com> wrote:
>
> > Thanks Kou! I have updated the description using .asf.yaml. Appreciate
> > everyone giving thought to this!
> >
> > On Thu, Jun 10, 2021 at 8:13 PM Sutou Kouhei <ko...@clear-code.com> wrote:
> > >
> > > It seems that we can use .asf.yaml to set the description on
> > > GitHub:
> > >
> > >
> > https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features#Git.asf.yamlfeatures-GitHubsettings
> > >
> > > github:
> > >   description: "Apache Arrow is ..."
> > >
> > > In <CA...@mail.gmail.com>
> > >   "Re: Long title on github page" on Thu, 10 Jun 2021 17:44:57 -0500,
> > >   Wes McKinney <we...@gmail.com> wrote:
> > >
> > > > I'll wait a day or two for more feedback to percolate and then ask
> > > > Infra to change the description on GitHub.
> > > >
> > > > On Thu, Jun 10, 2021 at 4:47 PM Adam Lippai <ad...@rigo.sk> wrote:
> > > >>
> > > >> +1
> > > >>
> > > >> On Thu, Jun 10, 2021, 23:38 Antoine Pitrou <an...@python.org>
> > wrote:
> > > >>
> > > >> >
> > > >> > Sound good enough to me.
> > > >> >
> > > >> >
> > > >> > Le 10/06/2021 à 23:35, Wes McKinney a écrit :
> > > >> > > I hate to reopen this can of worms again, but here is my effort to
> > > >> > > synthesize feedback:
> > > >> > >
> > > >> > > "Apache Arrow is a multi-language toolbox for accelerated data
> > > >> > > interchange and in-memory processing."
> > > >> > >
> > > >> > > On Thu, Jun 10, 2021 at 12:37 PM Dominik Moritz <
> > domoritz@apache.org>
> > > >> > wrote:
> > > >> > >>
> > > >> > >> I thought there were some good suggestions in this thread. @Wes,
> > did you
> > > >> > >> find a description you liked?
> > > >> > >>
> > > >> > >> On May 18, 2021 at 06:24:47, Adam Hooper <ad...@adamhooper.com>
> > wrote:
> > > >> > >>
> > > >> > >>> Poll question: why did you choose Arrow?
> > > >> > >>>
> > > >> > >>> Personally: I researched Arrow because it's a spec for IPC. (My
> > > >> > requirement
> > > >> > >>> was: "wrap computations in a separate process.") I chose Arrow
> > for its
> > > >> > >>> community and ecosystem -- in other words, because my peers
> > chose it.
> > > >> > >>>
> > > >> > >>> I happen to use the compute kernel and Parquet capabilities
> > every day;
> > > >> > but
> > > >> > >>> they did not sway me at all. I would choose Arrow if it were
> > nothing
> > > >> > but
> > > >> > >>> this spec and this community. (I chose HTML, after all.)
> > > >> > >>>
> > > >> > >>> I see the *code* as one enormous proof that the *spec* is good,
> > and as
> > > >> > a
> > > >> > >>> collection of examples and best practices.
> > > >> > >>>
> > > >> > >>> ... so a great pitch to me would be: "Apache Arrow is a data
> > format and
> > > >> > >>> toolbox for efficient in-memory processing."
> > > >> > >>>
> > > >> > >>> Enjoy life,
> > > >> > >>> Adam
> > > >> > >>>
> > > >> > >>> On Tue, May 18, 2021 at 2:38 AM Aldrin
> > <ak...@ucsc.edu.invalid>
> > > >> > wrote:
> > > >> > >>>
> > > >> > >>> "Apache Arrow is a data processing library that also provides a
> > > >> > uniform,
> > > >> > >>>
> > > >> > >>> efficient interface for data systems."
> > > >> > >>>
> > > >> > >>>
> > > >> > >>> This probably still isn't quite right, I imagine the bit about
> > "for
> > > >> > data
> > > >> > >>>
> > > >> > >>> systems" needs some addition (maybe "for transport between data
> > > >> > systems")?
> > > >> > >>>
> > > >> > >>>
> > > >> > >>> My primary motivators:
> > > >> > >>>
> > > >> > >>>
> > > >> > >>>     - "A data processing library":
> > > >> > >>>
> > > >> > >>>        - Arrow provides many language bindings, but ultimately
> > they're
> > > >> > all
> > > >> > >>>
> > > >> > >>>        part of the same "library ecosystem", which I think is
> > fine to
> > > >> > >>>
> > > >> > >>> capture in
> > > >> > >>>
> > > >> > >>>        "library"
> > > >> > >>>
> > > >> > >>>        - A main goal of arrow is for processing to be fast,
> > whatever
> > > >> > that
> > > >> > >>>
> > > >> > >>>        processing may be
> > > >> > >>>
> > > >> > >>>        - "uniform, efficient interface for data systems":
> > > >> > >>>
> > > >> > >>>        - Arrow, provides (or tries to) a cohesive ("uniform")
> > > >> > interface for
> > > >> > >>>
> > > >> > >>>        data processing (although it has several APIs to do this)
> > > >> > >>>
> > > >> > >>>        - Also, IMO, a motivation for arrow was a format and
> > library to
> > > >> > >>>
> > > >> > >>>        facilitate processing, but that provided functions and
> > > >> > >>>
> > > >> > >>> interfaces to easily
> > > >> > >>>
> > > >> > >>>        translate into optimized data formats used by disparate
> > data
> > > >> > systems
> > > >> > >>>
> > > >> > >>>        (cassandra, hadoop, etc.).
> > > >> > >>>
> > > >> > >>>        - Arrow tries to be transparently zero-copy, which is
> > part of
> > > >> > the
> > > >> > >>>
> > > >> > >>>        interface for efficiency
> > > >> > >>>
> > > >> > >>>     - Arrow certainly has a data format, but that format is the
> > crux
> > > >> > of the
> > > >> > >>>
> > > >> > >>>     interface (IMO). However, it also makes using other formats
> > easy
> > > >> > (via
> > > >> > >>>
> > > >> > >>>     filesystem API and parquet reader/writers, etc.). So,
> > focusing on
> > > >> > the
> > > >> > >>>
> > > >> > >>> data
> > > >> > >>>
> > > >> > >>>     format seems unnecessary in such a terse description.
> > > >> > >>>
> > > >> > >>>
> > > >> > >>>
> > > >> > >>> Aldrin Montana
> > > >> > >>>
> > > >> > >>> Computer Science PhD Student
> > > >> > >>>
> > > >> > >>> UC Santa Cruz
> > > >> > >>>
> > > >> > >>>
> > > >> > >>>
> > > >> > >>> On Mon, May 17, 2021 at 5:07 PM Weston Pace <
> > weston.pace@gmail.com>
> > > >> > wrote:
> > > >> > >>>
> > > >> > >>>
> > > >> > >>>> I'd avoid the word "structured" as it is somewhat ill-defined.
> > > >> > >>>
> > > >> > >>>>
> > > >> > >>>
> > > >> > >>>> On Mon, May 17, 2021 at 12:37 PM Mauricio Vargas
> > > >> > >>>
> > > >> > >>>> <ma...@ursacomputing.com> wrote:
> > > >> > >>>
> > > >> > >>>>>
> > > >> > >>>
> > > >> > >>>>> more marketed:
> > > >> > >>>
> > > >> > >>>>> How about: "Apache Arrow is a format and language-agnostic
> > library
> > > >> > >>>
> > > >> > >>>> focused
> > > >> > >>>
> > > >> > >>>>> on efficient sharing and processing of structured data."
> > > >> > >>>
> > > >> > >>>>>
> > > >> > >>>
> > > >> > >>>>> On Mon, May 17, 2021 at 6:25 PM Micah Kornfield <
> > > >> > emkornfield@gmail.com
> > > >> > >>>
> > > >> > >>>>
> > > >> > >>>
> > > >> > >>>>> wrote:
> > > >> > >>>
> > > >> > >>>>>
> > > >> > >>>
> > > >> > >>>>>> How about: "Apache Arrow is a collection of specifications,
> > cross
> > > >> > >>>
> > > >> > >>>> language
> > > >> > >>>
> > > >> > >>>>>> libraries and applications focused on efficient sharing and
> > > >> > >>>
> > > >> > >>> processing
> > > >> > >>>
> > > >> > >>>> of
> > > >> > >>>
> > > >> > >>>>>> structured data."
> > > >> > >>>
> > > >> > >>>>>>
> > > >> > >>>
> > > >> > >>>>>> On Mon, May 17, 2021 at 3:06 PM Wes McKinney <
> > wesmckinn@gmail.com>
> > > >> > >>>
> > > >> > >>>> wrote:
> > > >> > >>>
> > > >> > >>>>>>
> > > >> > >>>
> > > >> > >>>>>>> On Mon, May 17, 2021 at 4:58 PM Weston Pace <
> > weston.pace@gmail.com
> > > >> > >>>
> > > >> > >>>>
> > > >> > >>>
> > > >> > >>>>>> wrote:
> > > >> > >>>
> > > >> > >>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>> “Apache Arrow is a format and compute kernel for in-memory
> > > >> > >>>
> > > >> > >>> data”
> > > >> > >>>
> > > >> > >>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>> I like this but no one ever knows what "in-memory" means
> > (or they
> > > >> > >>>
> > > >> > >>>> just
> > > >> > >>>
> > > >> > >>>>>>>> think 'data is always in memory').  How about...
> > > >> > >>>
> > > >> > >>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>> "Apache Arrow is a format and compute kernel for zero-copy
> > > >> > >>>
> > > >> > >>>> processing
> > > >> > >>>
> > > >> > >>>>>>>> and sharing of data."
> > > >> > >>>
> > > >> > >>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>> or...
> > > >> > >>>
> > > >> > >>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>> "Apache Arrow is a format and compute kernel for
> > processing and
> > > >> > >>>
> > > >> > >>>>>>>> sharing data without serialization overhead."
> > > >> > >>>
> > > >> > >>>>>>>
> > > >> > >>>
> > > >> > >>>>>>> A few issues with this:
> > > >> > >>>
> > > >> > >>>>>>>
> > > >> > >>>
> > > >> > >>>>>>> * Multiple PL aspect unclear (is a single piece of
> > software, or
> > > >> > >>>
> > > >> > >>>>>>> multiple pieces of software?)
> > > >> > >>>
> > > >> > >>>>>>> * Development platform aspect unclear
> > > >> > >>>
> > > >> > >>>>>>>
> > > >> > >>>
> > > >> > >>>>>>> I see that some people don't like the word "platform". Some
> > people
> > > >> > >>>
> > > >> > >>>>>>> come to this project and want to find an end-to-end
> > application,
> > > >> > >>>
> > > >> > >>>>>>> rather than a developer toolkit that they can use to build
> > > >> > >>>
> > > >> > >>>>>>> applications. Perhaps we should be more explicit and use
> > > >> > >>>
> > > >> > >>>>>>> "computational development toolkit" instead of "platform".
> > > >> > >>>
> > > >> > >>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>> Although marshalling[1] would probably be a more precise
> > word it
> > > >> > >>>
> > > >> > >>> is
> > > >> > >>>
> > > >> > >>>>>>>> not as well known.
> > > >> > >>>
> > > >> > >>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>> [1]
> > https://en.wikipedia.org/wiki/Marshalling_(computer_science)
> > > >> > >>>
> > > >> > >>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>> On Mon, May 17, 2021 at 9:36 AM Mauricio Vargas
> > > >> > >>>
> > > >> > >>>>>>>> <ma...@ursacomputing.com> wrote:
> > > >> > >>>
> > > >> > >>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>> a few ideas
> > > >> > >>>
> > > >> > >>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>> github.com/apache/arrow - Apache Arrow is an efficient
> > library
> > > >> > >>>
> > > >> > >>>> for
> > > >> > >>>
> > > >> > >>>>>>> big data
> > > >> > >>>
> > > >> > >>>>>>>>> processing and sharing
> > > >> > >>>
> > > >> > >>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>> github.com/apache/arrow - Apache Arrow is a
> > computational tool
> > > >> > >>>
> > > >> > >>>> for
> > > >> > >>>
> > > >> > >>>>>>>>> processing, storing and sharing large datasets
> > > >> > >>>
> > > >> > >>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>> github.com/apache/arrow - Apache Arrow is a  fast and
> > simple
> > > >> > >>>
> > > >> > >>>> library
> > > >> > >>>
> > > >> > >>>>>>> for
> > > >> > >>>
> > > >> > >>>>>>>>> big data analytics
> > > >> > >>>
> > > >> > >>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>> *github.com/apache/arrow <http://github.com/apache/arrow>
> > -
> > > >> > >>>
> > > >> > >>>> Apache
> > > >> > >>>
> > > >> > >>>>>>> Arrow is
> > > >> > >>>
> > > >> > >>>>>>>>> a powerful workhorse for analytic operations on modern
> > > >> > >>>
> > > >> > >>> hardware*
> > > >> > >>>
> > > >> > >>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>> On Mon, May 17, 2021 at 3:13 PM Julian Hyde <
> > > >> > >>>
> > > >> > >>>> jhyde.apache@gmail.com>
> > > >> > >>>
> > > >> > >>>>>>> wrote:
> > > >> > >>>
> > > >> > >>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>> Alright, well, whatever it is, it must fit into one
> > breath.
> > > >> > >>>
> > > >> > >>> If
> > > >> > >>>
> > > >> > >>>> the
> > > >> > >>>
> > > >> > >>>>>>>>>> high-concept pitch is successful, people will stick
> > around
> > > >> > >>>
> > > >> > >>> for
> > > >> > >>>
> > > >> > >>>> the
> > > >> > >>>
> > > >> > >>>>>>> full
> > > >> > >>>
> > > >> > >>>>>>>>>> pitch.
> > > >> > >>>
> > > >> > >>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>> Words such as “platform” and “enable” are noise. You say
> > > >> > >>>
> > > >> > >>>>>> “platform”,
> > > >> > >>>
> > > >> > >>>>>>> they
> > > >> > >>>
> > > >> > >>>>>>>>>> start to say “what exactly do you mean by platform”, the
> > > >> > >>>
> > > >> > >>>> elevator
> > > >> > >>>
> > > >> > >>>>>>> doors
> > > >> > >>>
> > > >> > >>>>>>>>>> open, and they’re gone.
> > > >> > >>>
> > > >> > >>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>> “Apache Arrow is a format and compute kernel for
> > in-memory
> > > >> > >>>
> > > >> > >>>> data”
> > > >> > >>>
> > > >> > >>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>> On May 17, 2021, at 12:03 PM, Eduardo Ponce <
> > > >> > >>>
> > > >> > >>>> edponce00@gmail.com
> > > >> > >>>
> > > >> > >>>>>>>
> > > >> > >>>
> > > >> > >>>>>>> wrote:
> > > >> > >>>
> > > >> > >>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>> One more suggestion for the bucket:
> > > >> > >>>
> > > >> > >>>>>>>>>>> "Apache Arrow is a computational platform for efficient
> > > >> > >>>
> > > >> > >>>> in-memory
> > > >> > >>>
> > > >> > >>>>>>> data
> > > >> > >>>
> > > >> > >>>>>>>>>>> representation and processing."
> > > >> > >>>
> > > >> > >>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>> On Mon, May 17, 2021 at 2:49 PM Wes McKinney <
> > > >> > >>>
> > > >> > >>>>>> wesmckinn@gmail.com>
> > > >> > >>>
> > > >> > >>>>>>>>>> wrote:
> > > >> > >>>
> > > >> > >>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>> I think less is better in the description, but
> > > >> > >>>
> > > >> > >>>> unfortunately the
> > > >> > >>>
> > > >> > >>>>>>>>>>>> association of Arrow as being "just a data format" has
> > > >> > >>>
> > > >> > >>> been
> > > >> > >>>
> > > >> > >>>>>>> actively
> > > >> > >>>
> > > >> > >>>>>>>>>>>> harmful in some ways to community growth. We have a
> > data
> > > >> > >>>
> > > >> > >>>> format,
> > > >> > >>>
> > > >> > >>>>>>> yes,
> > > >> > >>>
> > > >> > >>>>>>>>>>>> but we are also creating a computational platform to go
> > > >> > >>>
> > > >> > >>>>>>> hand-in-hand
> > > >> > >>>
> > > >> > >>>>>>>>>>>> with the data format to make it easier to build fast
> > > >> > >>>
> > > >> > >>>>>> applications
> > > >> > >>>
> > > >> > >>>>>>> that
> > > >> > >>>
> > > >> > >>>>>>>>>>>> use the data format. So the description needs to
> > capture
> > > >> > >>>
> > > >> > >>>> both of
> > > >> > >>>
> > > >> > >>>>>>> these
> > > >> > >>>
> > > >> > >>>>>>>>>>>> ideas.
> > > >> > >>>
> > > >> > >>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>> On Mon, May 17, 2021 at 12:15 PM Julian Hyde <
> > > >> > >>>
> > > >> > >>>>>>> jhyde.apache@gmail.com>
> > > >> > >>>
> > > >> > >>>>>>>>>>>> wrote:
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>> I think that the “cross-language development platform
> > > >> > >>>
> > > >> > >>> for”
> > > >> > >>>
> > > >> > >>>> is
> > > >> > >>>
> > > >> > >>>>>>> noise.
> > > >> > >>>
> > > >> > >>>>>>>>>>>> (I’m sure that JPEG developers think that JPEG is a
> > > >> > >>>
> > > >> > >>>>>>> “cross-language
> > > >> > >>>
> > > >> > >>>>>>>>>>>> development platform” too. But it isn’t. It is an image
> > > >> > >>>
> > > >> > >>>> format.)
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>> "Apache Arrow is data format for efficient in-memory
> > > >> > >>>
> > > >> > >>>>>> processing.”
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>> I’ll note that In marketing speak, we are developing a
> > > >> > >>>
> > > >> > >>>>>>> high-concept
> > > >> > >>>
> > > >> > >>>>>>>>>>>> pitch [1] here. Every company needs a name, a brand, a
> > > >> > >>>
> > > >> > >>>>>>> high-concept
> > > >> > >>>
> > > >> > >>>>>>>>>> pitch,
> > > >> > >>>
> > > >> > >>>>>>>>>>>> and 3- or 4-sentence description. But every Apache
> > project
> > > >> > >>>
> > > >> > >>>> needs
> > > >> > >>>
> > > >> > >>>>>>> these
> > > >> > >>>
> > > >> > >>>>>>>>>> too.
> > > >> > >>>
> > > >> > >>>>>>>>>>>> It’s worth spending the time on the description, also,
> > and
> > > >> > >>>
> > > >> > >>>> then
> > > >> > >>>
> > > >> > >>>>>>> use
> > > >> > >>>
> > > >> > >>>>>>>>>> them in
> > > >> > >>>
> > > >> > >>>>>>>>>>>> all the places that we describe Arrow.
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>> Julian
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>> [1]
> > > >> > >>>
> > > >> > >>>>>>>
> > https://www.growthink.com/content/whats-your-high-concept-pitch
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>> On May 17, 2021, at 7:38 AM, Eduardo Ponce <
> > > >> > >>>
> > > >> > >>>>>> edponce00@gmail.com
> > > >> > >>>
> > > >> > >>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>> wrote:
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>> I agree with Nate's and Brian's suggestions, but
> > would
> > > >> > >>>
> > > >> > >>>> like to
> > > >> > >>>
> > > >> > >>>>>>> add
> > > >> > >>>
> > > >> > >>>>>>>>>>>> that we
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>> can make it a one-liner for more conciseness and
> > > >> > >>>
> > > >> > >>>> consistency
> > > >> > >>>
> > > >> > >>>>>>> with
> > > >> > >>>
> > > >> > >>>>>>>>>> other
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>> Apache projects.
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>> Apologies if it seems I am going around the
> > suggestions
> > > >> > >>>
> > > >> > >>>> loop
> > > >> > >>>
> > > >> > >>>>>>> again.
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>> "Apache Arrow is a cross-language development
> > platform
> > > >> > >>>
> > > >> > >>>>>> enabling
> > > >> > >>>
> > > >> > >>>>>>>>>>>> efficient
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>> in-memory data processing and transport."
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>> On Mon, May 17, 2021 at 10:11 AM Brian Hulette <
> > > >> > >>>
> > > >> > >>>>>>> bhulette@apache.org>
> > > >> > >>>
> > > >> > >>>>>>>>>>>> wrote:
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>> Thank you for bringing this up Dominik. I sampled
> > some
> > > >> > >>>
> > > >> > >>>> of the
> > > >> > >>>
> > > >> > >>>>>>>>>>>> descriptions
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>> for other Apache projects I frequent, the ones with
> > a
> > > >> > >>>
> > > >> > >>>>>>> meaningful
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>> description have a single sentence:
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>> github.com/apache/spark - Apache Spark - A unified
> > > >> > >>>
> > > >> > >>>> analytics
> > > >> > >>>
> > > >> > >>>>>>> engine
> > > >> > >>>
> > > >> > >>>>>>>>>>>> for
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>> large-scale data processing
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>> github.com/apache/beam - Apache Beam is a unified
> > > >> > >>>
> > > >> > >>>>>> programming
> > > >> > >>>
> > > >> > >>>>>>> model
> > > >> > >>>
> > > >> > >>>>>>>>>>>> for
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>> Batch and Streaming
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>> github.com/apache/avro - Apache Avro is a data
> > > >> > >>>
> > > >> > >>>> serialization
> > > >> > >>>
> > > >> > >>>>>>> system
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>> Several others (Flink, Hadoop, ...) just have
> > "[Mirror
> > > >> > >>>
> > > >> > >>>> of]
> > > >> > >>>
> > > >> > >>>>>>> Apache
> > > >> > >>>
> > > >> > >>>>>>>>>>>> <name>"
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>> as the description.
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>> +1 for Nate's suggestion "Apache Arrow is a
> > > >> > >>>
> > > >> > >>>> cross-language
> > > >> > >>>
> > > >> > >>>>>>>>>> development
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>> platform for in-memory data. It enables systems to
> > > >> > >>>
> > > >> > >>>> process
> > > >> > >>>
> > > >> > >>>>>> and
> > > >> > >>>
> > > >> > >>>>>>>>>>>> transport
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>> data more efficiently."
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>> On Mon, May 17, 2021 at 5:23 AM Wes McKinney <
> > > >> > >>>
> > > >> > >>>>>>> wesmckinn@gmail.com>
> > > >> > >>>
> > > >> > >>>>>>>>>>>> wrote:
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>> It's probably best for description to limit
> > mentions
> > > >> > >>>
> > > >> > >>> of
> > > >> > >>>
> > > >> > >>>>>>> specific
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>> features. There are some high level features
> > mentioned
> > > >> > >>>
> > > >> > >>>> in
> > > >> > >>>
> > > >> > >>>>>> the
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>> description now ("computational libraries and
> > > >> > >>>
> > > >> > >>> zero-copy
> > > >> > >>>
> > > >> > >>>>>>> streaming
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>> messaging and interprocess communication"), but
> > now in
> > > >> > >>>
> > > >> > >>>> 2021
> > > >> > >>>
> > > >> > >>>>>>> since
> > > >> > >>>
> > > >> > >>>>>>>>>> the
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>> project has grown so much, it could leave people
> > with
> > > >> > >>>
> > > >> > >>> a
> > > >> > >>>
> > > >> > >>>>>>> limited view
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>> of what they might find here.
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>> On Mon, May 17, 2021 at 12:14 AM Mauricio Vargas
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>> <ma...@ursacomputing.com> wrote:
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>> How about
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>> 'Apache Arrow is a cross-language development
> > > >> > >>>
> > > >> > >>> platform
> > > >> > >>>
> > > >> > >>>> for
> > > >> > >>>
> > > >> > >>>>>>>>>> in-memory
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>> data.
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>> It enables systems to process and transport data
> > > >> > >>>
> > > >> > >>>>>> efficiently,
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>> providing a
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>> simple and fast library for partitioning of large
> > > >> > >>>
> > > >> > >>>> tables'?
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>> Sorry the delay, long election day
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>> On Sun, May 16, 2021, 2:27 PM Nate Bauernfeind <
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>> natebauernfeind@deephaven.io>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>> wrote:
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>> Suggestion: faster -> more efficiently
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>> "Apache Arrow is a cross-language development
> > > >> > >>>
> > > >> > >>>> platform for
> > > >> > >>>
> > > >> > >>>>>>>>>>>> in-memory
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>> data. It enables systems to process and transport
> > > >> > >>>
> > > >> > >>> data
> > > >> > >>>
> > > >> > >>>>>> more
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>> efficiently."
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>> On Sun, May 16, 2021 at 11:35 AM Wes McKinney <
> > > >> > >>>
> > > >> > >>>>>>>>>> wesmckinn@gmail.com
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>> wrote:
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>> Here's what there now:
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>> "Apache Arrow is a cross-language development
> > > >> > >>>
> > > >> > >>>> platform
> > > >> > >>>
> > > >> > >>>>>> for
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>> in-memory
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>> data. It specifies a standardized
> > > >> > >>>
> > > >> > >>>> language-independent
> > > >> > >>>
> > > >> > >>>>>>> columnar
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>> memory
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>> format for flat and hierarchical data, organized
> > > >> > >>>
> > > >> > >>> for
> > > >> > >>>
> > > >> > >>>>>>> efficient
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>> analytic operations on modern hardware. It also
> > > >> > >>>
> > > >> > >>>> provides
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>> computational
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>> libraries and zero-copy streaming messaging and
> > > >> > >>>
> > > >> > >>>>>>> interprocess
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>> communication…"
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>> How about something shorter like
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>> "Apache Arrow is a cross-language development
> > > >> > >>>
> > > >> > >>>> platform
> > > >> > >>>
> > > >> > >>>>>> for
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>> in-memory
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>> data. It enables systems to process and
> > transport
> > > >> > >>>
> > > >> > >>>> data
> > > >> > >>>
> > > >> > >>>>>>> faster."
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>> Suggestions / refinements from others welcome
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>> On Sat, May 15, 2021 at 9:12 PM Dominik Moritz <
> > > >> > >>>
> > > >> > >>>>>>> domoritz@cmu.edu
> > > >> > >>>
> > > >> > >>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>> wrote:
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>>> Super minor issue but could someone make the
> > > >> > >>>
> > > >> > >>>> description
> > > >> > >>>
> > > >> > >>>>>>> on
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>> GitHub
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>> shorter?
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>>> GitHub puts the description into the title of
> > the
> > > >> > >>>
> > > >> > >>>> page
> > > >> > >>>
> > > >> > >>>>>>> and makes
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>> it
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>> hard
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>> to find it in URL autocomplete.
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>> --
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>>>>
> > > >> > >>>
> > > >> > >>>>>>>
> > > >> > >>>
> > > >> > >>>>>>
> > > >> > >>>
> > > >> > >>>>
> > > >> > >>>
> > > >> > >>>
> > > >> > >>>
> > > >> > >>>
> > > >> > >>> --
> > > >> > >>> Adam Hooper
> > > >> > >>> +1-514-882-9694
> > > >> > >>> http://adamhooper.com
> > > >> > >>>
> > > >> >
> >

Re: Long title on github page

Posted by Joris Peeters <jo...@gmail.com>.
+1

On Sat, Jun 12, 2021 at 2:56 PM Wes McKinney <we...@gmail.com> wrote:

> Thanks Kou! I have updated the description using .asf.yaml. Appreciate
> everyone giving thought to this!
>
> On Thu, Jun 10, 2021 at 8:13 PM Sutou Kouhei <ko...@clear-code.com> wrote:
> >
> > It seems that we can use .asf.yaml to set the description on
> > GitHub:
> >
> >
> https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features#Git.asf.yamlfeatures-GitHubsettings
> >
> > github:
> >   description: "Apache Arrow is ..."
> >
> > In <CA...@mail.gmail.com>
> >   "Re: Long title on github page" on Thu, 10 Jun 2021 17:44:57 -0500,
> >   Wes McKinney <we...@gmail.com> wrote:
> >
> > > I'll wait a day or two for more feedback to percolate and then ask
> > > Infra to change the description on GitHub.
> > >
> > > On Thu, Jun 10, 2021 at 4:47 PM Adam Lippai <ad...@rigo.sk> wrote:
> > >>
> > >> +1
> > >>
> > >> On Thu, Jun 10, 2021, 23:38 Antoine Pitrou <an...@python.org>
> wrote:
> > >>
> > >> >
> > >> > Sound good enough to me.
> > >> >
> > >> >
> > >> > Le 10/06/2021 à 23:35, Wes McKinney a écrit :
> > >> > > I hate to reopen this can of worms again, but here is my effort to
> > >> > > synthesize feedback:
> > >> > >
> > >> > > "Apache Arrow is a multi-language toolbox for accelerated data
> > >> > > interchange and in-memory processing."
> > >> > >
> > >> > > On Thu, Jun 10, 2021 at 12:37 PM Dominik Moritz <
> domoritz@apache.org>
> > >> > wrote:
> > >> > >>
> > >> > >> I thought there were some good suggestions in this thread. @Wes,
> did you
> > >> > >> find a description you liked?
> > >> > >>
> > >> > >> On May 18, 2021 at 06:24:47, Adam Hooper <ad...@adamhooper.com>
> wrote:
> > >> > >>
> > >> > >>> Poll question: why did you choose Arrow?
> > >> > >>>
> > >> > >>> Personally: I researched Arrow because it's a spec for IPC. (My
> > >> > requirement
> > >> > >>> was: "wrap computations in a separate process.") I chose Arrow
> for its
> > >> > >>> community and ecosystem -- in other words, because my peers
> chose it.
> > >> > >>>
> > >> > >>> I happen to use the compute kernel and Parquet capabilities
> every day;
> > >> > but
> > >> > >>> they did not sway me at all. I would choose Arrow if it were
> nothing
> > >> > but
> > >> > >>> this spec and this community. (I chose HTML, after all.)
> > >> > >>>
> > >> > >>> I see the *code* as one enormous proof that the *spec* is good,
> and as
> > >> > a
> > >> > >>> collection of examples and best practices.
> > >> > >>>
> > >> > >>> ... so a great pitch to me would be: "Apache Arrow is a data
> format and
> > >> > >>> toolbox for efficient in-memory processing."
> > >> > >>>
> > >> > >>> Enjoy life,
> > >> > >>> Adam
> > >> > >>>
> > >> > >>> On Tue, May 18, 2021 at 2:38 AM Aldrin
> <ak...@ucsc.edu.invalid>
> > >> > wrote:
> > >> > >>>
> > >> > >>> "Apache Arrow is a data processing library that also provides a
> > >> > uniform,
> > >> > >>>
> > >> > >>> efficient interface for data systems."
> > >> > >>>
> > >> > >>>
> > >> > >>> This probably still isn't quite right, I imagine the bit about
> "for
> > >> > data
> > >> > >>>
> > >> > >>> systems" needs some addition (maybe "for transport between data
> > >> > systems")?
> > >> > >>>
> > >> > >>>
> > >> > >>> My primary motivators:
> > >> > >>>
> > >> > >>>
> > >> > >>>     - "A data processing library":
> > >> > >>>
> > >> > >>>        - Arrow provides many language bindings, but ultimately
> they're
> > >> > all
> > >> > >>>
> > >> > >>>        part of the same "library ecosystem", which I think is
> fine to
> > >> > >>>
> > >> > >>> capture in
> > >> > >>>
> > >> > >>>        "library"
> > >> > >>>
> > >> > >>>        - A main goal of arrow is for processing to be fast,
> whatever
> > >> > that
> > >> > >>>
> > >> > >>>        processing may be
> > >> > >>>
> > >> > >>>        - "uniform, efficient interface for data systems":
> > >> > >>>
> > >> > >>>        - Arrow, provides (or tries to) a cohesive ("uniform")
> > >> > interface for
> > >> > >>>
> > >> > >>>        data processing (although it has several APIs to do this)
> > >> > >>>
> > >> > >>>        - Also, IMO, a motivation for arrow was a format and
> library to
> > >> > >>>
> > >> > >>>        facilitate processing, but that provided functions and
> > >> > >>>
> > >> > >>> interfaces to easily
> > >> > >>>
> > >> > >>>        translate into optimized data formats used by disparate
> data
> > >> > systems
> > >> > >>>
> > >> > >>>        (cassandra, hadoop, etc.).
> > >> > >>>
> > >> > >>>        - Arrow tries to be transparently zero-copy, which is
> part of
> > >> > the
> > >> > >>>
> > >> > >>>        interface for efficiency
> > >> > >>>
> > >> > >>>     - Arrow certainly has a data format, but that format is the
> crux
> > >> > of the
> > >> > >>>
> > >> > >>>     interface (IMO). However, it also makes using other formats
> easy
> > >> > (via
> > >> > >>>
> > >> > >>>     filesystem API and parquet reader/writers, etc.). So,
> focusing on
> > >> > the
> > >> > >>>
> > >> > >>> data
> > >> > >>>
> > >> > >>>     format seems unnecessary in such a terse description.
> > >> > >>>
> > >> > >>>
> > >> > >>>
> > >> > >>> Aldrin Montana
> > >> > >>>
> > >> > >>> Computer Science PhD Student
> > >> > >>>
> > >> > >>> UC Santa Cruz
> > >> > >>>
> > >> > >>>
> > >> > >>>
> > >> > >>> On Mon, May 17, 2021 at 5:07 PM Weston Pace <
> weston.pace@gmail.com>
> > >> > wrote:
> > >> > >>>
> > >> > >>>
> > >> > >>>> I'd avoid the word "structured" as it is somewhat ill-defined.
> > >> > >>>
> > >> > >>>>
> > >> > >>>
> > >> > >>>> On Mon, May 17, 2021 at 12:37 PM Mauricio Vargas
> > >> > >>>
> > >> > >>>> <ma...@ursacomputing.com> wrote:
> > >> > >>>
> > >> > >>>>>
> > >> > >>>
> > >> > >>>>> more marketed:
> > >> > >>>
> > >> > >>>>> How about: "Apache Arrow is a format and language-agnostic
> library
> > >> > >>>
> > >> > >>>> focused
> > >> > >>>
> > >> > >>>>> on efficient sharing and processing of structured data."
> > >> > >>>
> > >> > >>>>>
> > >> > >>>
> > >> > >>>>> On Mon, May 17, 2021 at 6:25 PM Micah Kornfield <
> > >> > emkornfield@gmail.com
> > >> > >>>
> > >> > >>>>
> > >> > >>>
> > >> > >>>>> wrote:
> > >> > >>>
> > >> > >>>>>
> > >> > >>>
> > >> > >>>>>> How about: "Apache Arrow is a collection of specifications,
> cross
> > >> > >>>
> > >> > >>>> language
> > >> > >>>
> > >> > >>>>>> libraries and applications focused on efficient sharing and
> > >> > >>>
> > >> > >>> processing
> > >> > >>>
> > >> > >>>> of
> > >> > >>>
> > >> > >>>>>> structured data."
> > >> > >>>
> > >> > >>>>>>
> > >> > >>>
> > >> > >>>>>> On Mon, May 17, 2021 at 3:06 PM Wes McKinney <
> wesmckinn@gmail.com>
> > >> > >>>
> > >> > >>>> wrote:
> > >> > >>>
> > >> > >>>>>>
> > >> > >>>
> > >> > >>>>>>> On Mon, May 17, 2021 at 4:58 PM Weston Pace <
> weston.pace@gmail.com
> > >> > >>>
> > >> > >>>>
> > >> > >>>
> > >> > >>>>>> wrote:
> > >> > >>>
> > >> > >>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>> “Apache Arrow is a format and compute kernel for in-memory
> > >> > >>>
> > >> > >>> data”
> > >> > >>>
> > >> > >>>>>>>>
> > >> > >>>
> > >> > >>>>>>>> I like this but no one ever knows what "in-memory" means
> (or they
> > >> > >>>
> > >> > >>>> just
> > >> > >>>
> > >> > >>>>>>>> think 'data is always in memory').  How about...
> > >> > >>>
> > >> > >>>>>>>>
> > >> > >>>
> > >> > >>>>>>>> "Apache Arrow is a format and compute kernel for zero-copy
> > >> > >>>
> > >> > >>>> processing
> > >> > >>>
> > >> > >>>>>>>> and sharing of data."
> > >> > >>>
> > >> > >>>>>>>>
> > >> > >>>
> > >> > >>>>>>>> or...
> > >> > >>>
> > >> > >>>>>>>>
> > >> > >>>
> > >> > >>>>>>>> "Apache Arrow is a format and compute kernel for
> processing and
> > >> > >>>
> > >> > >>>>>>>> sharing data without serialization overhead."
> > >> > >>>
> > >> > >>>>>>>
> > >> > >>>
> > >> > >>>>>>> A few issues with this:
> > >> > >>>
> > >> > >>>>>>>
> > >> > >>>
> > >> > >>>>>>> * Multiple PL aspect unclear (is a single piece of
> software, or
> > >> > >>>
> > >> > >>>>>>> multiple pieces of software?)
> > >> > >>>
> > >> > >>>>>>> * Development platform aspect unclear
> > >> > >>>
> > >> > >>>>>>>
> > >> > >>>
> > >> > >>>>>>> I see that some people don't like the word "platform". Some
> people
> > >> > >>>
> > >> > >>>>>>> come to this project and want to find an end-to-end
> application,
> > >> > >>>
> > >> > >>>>>>> rather than a developer toolkit that they can use to build
> > >> > >>>
> > >> > >>>>>>> applications. Perhaps we should be more explicit and use
> > >> > >>>
> > >> > >>>>>>> "computational development toolkit" instead of "platform".
> > >> > >>>
> > >> > >>>>>>>
> > >> > >>>
> > >> > >>>>>>>> Although marshalling[1] would probably be a more precise
> word it
> > >> > >>>
> > >> > >>> is
> > >> > >>>
> > >> > >>>>>>>> not as well known.
> > >> > >>>
> > >> > >>>>>>>>
> > >> > >>>
> > >> > >>>>>>>> [1]
> https://en.wikipedia.org/wiki/Marshalling_(computer_science)
> > >> > >>>
> > >> > >>>>>>>>
> > >> > >>>
> > >> > >>>>>>>> On Mon, May 17, 2021 at 9:36 AM Mauricio Vargas
> > >> > >>>
> > >> > >>>>>>>> <ma...@ursacomputing.com> wrote:
> > >> > >>>
> > >> > >>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>> a few ideas
> > >> > >>>
> > >> > >>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>> github.com/apache/arrow - Apache Arrow is an efficient
> library
> > >> > >>>
> > >> > >>>> for
> > >> > >>>
> > >> > >>>>>>> big data
> > >> > >>>
> > >> > >>>>>>>>> processing and sharing
> > >> > >>>
> > >> > >>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>> github.com/apache/arrow - Apache Arrow is a
> computational tool
> > >> > >>>
> > >> > >>>> for
> > >> > >>>
> > >> > >>>>>>>>> processing, storing and sharing large datasets
> > >> > >>>
> > >> > >>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>> github.com/apache/arrow - Apache Arrow is a  fast and
> simple
> > >> > >>>
> > >> > >>>> library
> > >> > >>>
> > >> > >>>>>>> for
> > >> > >>>
> > >> > >>>>>>>>> big data analytics
> > >> > >>>
> > >> > >>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>> *github.com/apache/arrow <http://github.com/apache/arrow>
> -
> > >> > >>>
> > >> > >>>> Apache
> > >> > >>>
> > >> > >>>>>>> Arrow is
> > >> > >>>
> > >> > >>>>>>>>> a powerful workhorse for analytic operations on modern
> > >> > >>>
> > >> > >>> hardware*
> > >> > >>>
> > >> > >>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>> On Mon, May 17, 2021 at 3:13 PM Julian Hyde <
> > >> > >>>
> > >> > >>>> jhyde.apache@gmail.com>
> > >> > >>>
> > >> > >>>>>>> wrote:
> > >> > >>>
> > >> > >>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>> Alright, well, whatever it is, it must fit into one
> breath.
> > >> > >>>
> > >> > >>> If
> > >> > >>>
> > >> > >>>> the
> > >> > >>>
> > >> > >>>>>>>>>> high-concept pitch is successful, people will stick
> around
> > >> > >>>
> > >> > >>> for
> > >> > >>>
> > >> > >>>> the
> > >> > >>>
> > >> > >>>>>>> full
> > >> > >>>
> > >> > >>>>>>>>>> pitch.
> > >> > >>>
> > >> > >>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>> Words such as “platform” and “enable” are noise. You say
> > >> > >>>
> > >> > >>>>>> “platform”,
> > >> > >>>
> > >> > >>>>>>> they
> > >> > >>>
> > >> > >>>>>>>>>> start to say “what exactly do you mean by platform”, the
> > >> > >>>
> > >> > >>>> elevator
> > >> > >>>
> > >> > >>>>>>> doors
> > >> > >>>
> > >> > >>>>>>>>>> open, and they’re gone.
> > >> > >>>
> > >> > >>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>> “Apache Arrow is a format and compute kernel for
> in-memory
> > >> > >>>
> > >> > >>>> data”
> > >> > >>>
> > >> > >>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>> On May 17, 2021, at 12:03 PM, Eduardo Ponce <
> > >> > >>>
> > >> > >>>> edponce00@gmail.com
> > >> > >>>
> > >> > >>>>>>>
> > >> > >>>
> > >> > >>>>>>> wrote:
> > >> > >>>
> > >> > >>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>> One more suggestion for the bucket:
> > >> > >>>
> > >> > >>>>>>>>>>> "Apache Arrow is a computational platform for efficient
> > >> > >>>
> > >> > >>>> in-memory
> > >> > >>>
> > >> > >>>>>>> data
> > >> > >>>
> > >> > >>>>>>>>>>> representation and processing."
> > >> > >>>
> > >> > >>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>> On Mon, May 17, 2021 at 2:49 PM Wes McKinney <
> > >> > >>>
> > >> > >>>>>> wesmckinn@gmail.com>
> > >> > >>>
> > >> > >>>>>>>>>> wrote:
> > >> > >>>
> > >> > >>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>> I think less is better in the description, but
> > >> > >>>
> > >> > >>>> unfortunately the
> > >> > >>>
> > >> > >>>>>>>>>>>> association of Arrow as being "just a data format" has
> > >> > >>>
> > >> > >>> been
> > >> > >>>
> > >> > >>>>>>> actively
> > >> > >>>
> > >> > >>>>>>>>>>>> harmful in some ways to community growth. We have a
> data
> > >> > >>>
> > >> > >>>> format,
> > >> > >>>
> > >> > >>>>>>> yes,
> > >> > >>>
> > >> > >>>>>>>>>>>> but we are also creating a computational platform to go
> > >> > >>>
> > >> > >>>>>>> hand-in-hand
> > >> > >>>
> > >> > >>>>>>>>>>>> with the data format to make it easier to build fast
> > >> > >>>
> > >> > >>>>>> applications
> > >> > >>>
> > >> > >>>>>>> that
> > >> > >>>
> > >> > >>>>>>>>>>>> use the data format. So the description needs to
> capture
> > >> > >>>
> > >> > >>>> both of
> > >> > >>>
> > >> > >>>>>>> these
> > >> > >>>
> > >> > >>>>>>>>>>>> ideas.
> > >> > >>>
> > >> > >>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>> On Mon, May 17, 2021 at 12:15 PM Julian Hyde <
> > >> > >>>
> > >> > >>>>>>> jhyde.apache@gmail.com>
> > >> > >>>
> > >> > >>>>>>>>>>>> wrote:
> > >> > >>>
> > >> > >>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>> I think that the “cross-language development platform
> > >> > >>>
> > >> > >>> for”
> > >> > >>>
> > >> > >>>> is
> > >> > >>>
> > >> > >>>>>>> noise.
> > >> > >>>
> > >> > >>>>>>>>>>>> (I’m sure that JPEG developers think that JPEG is a
> > >> > >>>
> > >> > >>>>>>> “cross-language
> > >> > >>>
> > >> > >>>>>>>>>>>> development platform” too. But it isn’t. It is an image
> > >> > >>>
> > >> > >>>> format.)
> > >> > >>>
> > >> > >>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>> "Apache Arrow is data format for efficient in-memory
> > >> > >>>
> > >> > >>>>>> processing.”
> > >> > >>>
> > >> > >>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>> I’ll note that In marketing speak, we are developing a
> > >> > >>>
> > >> > >>>>>>> high-concept
> > >> > >>>
> > >> > >>>>>>>>>>>> pitch [1] here. Every company needs a name, a brand, a
> > >> > >>>
> > >> > >>>>>>> high-concept
> > >> > >>>
> > >> > >>>>>>>>>> pitch,
> > >> > >>>
> > >> > >>>>>>>>>>>> and 3- or 4-sentence description. But every Apache
> project
> > >> > >>>
> > >> > >>>> needs
> > >> > >>>
> > >> > >>>>>>> these
> > >> > >>>
> > >> > >>>>>>>>>> too.
> > >> > >>>
> > >> > >>>>>>>>>>>> It’s worth spending the time on the description, also,
> and
> > >> > >>>
> > >> > >>>> then
> > >> > >>>
> > >> > >>>>>>> use
> > >> > >>>
> > >> > >>>>>>>>>> them in
> > >> > >>>
> > >> > >>>>>>>>>>>> all the places that we describe Arrow.
> > >> > >>>
> > >> > >>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>> Julian
> > >> > >>>
> > >> > >>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>> [1]
> > >> > >>>
> > >> > >>>>>>>
> https://www.growthink.com/content/whats-your-high-concept-pitch
> > >> > >>>
> > >> > >>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>> On May 17, 2021, at 7:38 AM, Eduardo Ponce <
> > >> > >>>
> > >> > >>>>>> edponce00@gmail.com
> > >> > >>>
> > >> > >>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>> wrote:
> > >> > >>>
> > >> > >>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>> I agree with Nate's and Brian's suggestions, but
> would
> > >> > >>>
> > >> > >>>> like to
> > >> > >>>
> > >> > >>>>>>> add
> > >> > >>>
> > >> > >>>>>>>>>>>> that we
> > >> > >>>
> > >> > >>>>>>>>>>>>>> can make it a one-liner for more conciseness and
> > >> > >>>
> > >> > >>>> consistency
> > >> > >>>
> > >> > >>>>>>> with
> > >> > >>>
> > >> > >>>>>>>>>> other
> > >> > >>>
> > >> > >>>>>>>>>>>>>> Apache projects.
> > >> > >>>
> > >> > >>>>>>>>>>>>>> Apologies if it seems I am going around the
> suggestions
> > >> > >>>
> > >> > >>>> loop
> > >> > >>>
> > >> > >>>>>>> again.
> > >> > >>>
> > >> > >>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>> "Apache Arrow is a cross-language development
> platform
> > >> > >>>
> > >> > >>>>>> enabling
> > >> > >>>
> > >> > >>>>>>>>>>>> efficient
> > >> > >>>
> > >> > >>>>>>>>>>>>>> in-memory data processing and transport."
> > >> > >>>
> > >> > >>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>> On Mon, May 17, 2021 at 10:11 AM Brian Hulette <
> > >> > >>>
> > >> > >>>>>>> bhulette@apache.org>
> > >> > >>>
> > >> > >>>>>>>>>>>> wrote:
> > >> > >>>
> > >> > >>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>> Thank you for bringing this up Dominik. I sampled
> some
> > >> > >>>
> > >> > >>>> of the
> > >> > >>>
> > >> > >>>>>>>>>>>> descriptions
> > >> > >>>
> > >> > >>>>>>>>>>>>>>> for other Apache projects I frequent, the ones with
> a
> > >> > >>>
> > >> > >>>>>>> meaningful
> > >> > >>>
> > >> > >>>>>>>>>>>>>>> description have a single sentence:
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>> github.com/apache/spark - Apache Spark - A unified
> > >> > >>>
> > >> > >>>> analytics
> > >> > >>>
> > >> > >>>>>>> engine
> > >> > >>>
> > >> > >>>>>>>>>>>> for
> > >> > >>>
> > >> > >>>>>>>>>>>>>>> large-scale data processing
> > >> > >>>
> > >> > >>>>>>>>>>>>>>> github.com/apache/beam - Apache Beam is a unified
> > >> > >>>
> > >> > >>>>>> programming
> > >> > >>>
> > >> > >>>>>>> model
> > >> > >>>
> > >> > >>>>>>>>>>>> for
> > >> > >>>
> > >> > >>>>>>>>>>>>>>> Batch and Streaming
> > >> > >>>
> > >> > >>>>>>>>>>>>>>> github.com/apache/avro - Apache Avro is a data
> > >> > >>>
> > >> > >>>> serialization
> > >> > >>>
> > >> > >>>>>>> system
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>> Several others (Flink, Hadoop, ...) just have
> "[Mirror
> > >> > >>>
> > >> > >>>> of]
> > >> > >>>
> > >> > >>>>>>> Apache
> > >> > >>>
> > >> > >>>>>>>>>>>> <name>"
> > >> > >>>
> > >> > >>>>>>>>>>>>>>> as the description.
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>> +1 for Nate's suggestion "Apache Arrow is a
> > >> > >>>
> > >> > >>>> cross-language
> > >> > >>>
> > >> > >>>>>>>>>> development
> > >> > >>>
> > >> > >>>>>>>>>>>>>>> platform for in-memory data. It enables systems to
> > >> > >>>
> > >> > >>>> process
> > >> > >>>
> > >> > >>>>>> and
> > >> > >>>
> > >> > >>>>>>>>>>>> transport
> > >> > >>>
> > >> > >>>>>>>>>>>>>>> data more efficiently."
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>> On Mon, May 17, 2021 at 5:23 AM Wes McKinney <
> > >> > >>>
> > >> > >>>>>>> wesmckinn@gmail.com>
> > >> > >>>
> > >> > >>>>>>>>>>>> wrote:
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>> It's probably best for description to limit
> mentions
> > >> > >>>
> > >> > >>> of
> > >> > >>>
> > >> > >>>>>>> specific
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>> features. There are some high level features
> mentioned
> > >> > >>>
> > >> > >>>> in
> > >> > >>>
> > >> > >>>>>> the
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>> description now ("computational libraries and
> > >> > >>>
> > >> > >>> zero-copy
> > >> > >>>
> > >> > >>>>>>> streaming
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>> messaging and interprocess communication"), but
> now in
> > >> > >>>
> > >> > >>>> 2021
> > >> > >>>
> > >> > >>>>>>> since
> > >> > >>>
> > >> > >>>>>>>>>> the
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>> project has grown so much, it could leave people
> with
> > >> > >>>
> > >> > >>> a
> > >> > >>>
> > >> > >>>>>>> limited view
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>> of what they might find here.
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>> On Mon, May 17, 2021 at 12:14 AM Mauricio Vargas
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>> <ma...@ursacomputing.com> wrote:
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>> How about
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>> 'Apache Arrow is a cross-language development
> > >> > >>>
> > >> > >>> platform
> > >> > >>>
> > >> > >>>> for
> > >> > >>>
> > >> > >>>>>>>>>> in-memory
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>> data.
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>> It enables systems to process and transport data
> > >> > >>>
> > >> > >>>>>> efficiently,
> > >> > >>>
> > >> > >>>>>>>>>>>>>>> providing a
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>> simple and fast library for partitioning of large
> > >> > >>>
> > >> > >>>> tables'?
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>> Sorry the delay, long election day
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>> On Sun, May 16, 2021, 2:27 PM Nate Bauernfeind <
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>> natebauernfeind@deephaven.io>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>> wrote:
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>> Suggestion: faster -> more efficiently
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>> "Apache Arrow is a cross-language development
> > >> > >>>
> > >> > >>>> platform for
> > >> > >>>
> > >> > >>>>>>>>>>>> in-memory
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>> data. It enables systems to process and transport
> > >> > >>>
> > >> > >>> data
> > >> > >>>
> > >> > >>>>>> more
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>> efficiently."
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>> On Sun, May 16, 2021 at 11:35 AM Wes McKinney <
> > >> > >>>
> > >> > >>>>>>>>>> wesmckinn@gmail.com
> > >> > >>>
> > >> > >>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>> wrote:
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>> Here's what there now:
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>> "Apache Arrow is a cross-language development
> > >> > >>>
> > >> > >>>> platform
> > >> > >>>
> > >> > >>>>>> for
> > >> > >>>
> > >> > >>>>>>>>>>>>>>> in-memory
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>> data. It specifies a standardized
> > >> > >>>
> > >> > >>>> language-independent
> > >> > >>>
> > >> > >>>>>>> columnar
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>> memory
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>> format for flat and hierarchical data, organized
> > >> > >>>
> > >> > >>> for
> > >> > >>>
> > >> > >>>>>>> efficient
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>> analytic operations on modern hardware. It also
> > >> > >>>
> > >> > >>>> provides
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>> computational
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>> libraries and zero-copy streaming messaging and
> > >> > >>>
> > >> > >>>>>>> interprocess
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>> communication…"
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>> How about something shorter like
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>> "Apache Arrow is a cross-language development
> > >> > >>>
> > >> > >>>> platform
> > >> > >>>
> > >> > >>>>>> for
> > >> > >>>
> > >> > >>>>>>>>>>>>>>> in-memory
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>> data. It enables systems to process and
> transport
> > >> > >>>
> > >> > >>>> data
> > >> > >>>
> > >> > >>>>>>> faster."
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>> Suggestions / refinements from others welcome
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>> On Sat, May 15, 2021 at 9:12 PM Dominik Moritz <
> > >> > >>>
> > >> > >>>>>>> domoritz@cmu.edu
> > >> > >>>
> > >> > >>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>> wrote:
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>>> Super minor issue but could someone make the
> > >> > >>>
> > >> > >>>> description
> > >> > >>>
> > >> > >>>>>>> on
> > >> > >>>
> > >> > >>>>>>>>>>>>>>> GitHub
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>> shorter?
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>>> GitHub puts the description into the title of
> the
> > >> > >>>
> > >> > >>>> page
> > >> > >>>
> > >> > >>>>>>> and makes
> > >> > >>>
> > >> > >>>>>>>>>>>>>>> it
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>> hard
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>> to find it in URL autocomplete.
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>> --
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>>>>
> > >> > >>>
> > >> > >>>>>>>
> > >> > >>>
> > >> > >>>>>>
> > >> > >>>
> > >> > >>>>
> > >> > >>>
> > >> > >>>
> > >> > >>>
> > >> > >>>
> > >> > >>> --
> > >> > >>> Adam Hooper
> > >> > >>> +1-514-882-9694
> > >> > >>> http://adamhooper.com
> > >> > >>>
> > >> >
>

Re: Long title on github page

Posted by Wes McKinney <we...@gmail.com>.
Thanks Kou! I have updated the description using .asf.yaml. Appreciate
everyone giving thought to this!

On Thu, Jun 10, 2021 at 8:13 PM Sutou Kouhei <ko...@clear-code.com> wrote:
>
> It seems that we can use .asf.yaml to set the description on
> GitHub:
>
> https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features#Git.asf.yamlfeatures-GitHubsettings
>
> github:
>   description: "Apache Arrow is ..."
>
> In <CA...@mail.gmail.com>
>   "Re: Long title on github page" on Thu, 10 Jun 2021 17:44:57 -0500,
>   Wes McKinney <we...@gmail.com> wrote:
>
> > I'll wait a day or two for more feedback to percolate and then ask
> > Infra to change the description on GitHub.
> >
> > On Thu, Jun 10, 2021 at 4:47 PM Adam Lippai <ad...@rigo.sk> wrote:
> >>
> >> +1
> >>
> >> On Thu, Jun 10, 2021, 23:38 Antoine Pitrou <an...@python.org> wrote:
> >>
> >> >
> >> > Sound good enough to me.
> >> >
> >> >
> >> > Le 10/06/2021 à 23:35, Wes McKinney a écrit :
> >> > > I hate to reopen this can of worms again, but here is my effort to
> >> > > synthesize feedback:
> >> > >
> >> > > "Apache Arrow is a multi-language toolbox for accelerated data
> >> > > interchange and in-memory processing."
> >> > >
> >> > > On Thu, Jun 10, 2021 at 12:37 PM Dominik Moritz <do...@apache.org>
> >> > wrote:
> >> > >>
> >> > >> I thought there were some good suggestions in this thread. @Wes, did you
> >> > >> find a description you liked?
> >> > >>
> >> > >> On May 18, 2021 at 06:24:47, Adam Hooper <ad...@adamhooper.com> wrote:
> >> > >>
> >> > >>> Poll question: why did you choose Arrow?
> >> > >>>
> >> > >>> Personally: I researched Arrow because it's a spec for IPC. (My
> >> > requirement
> >> > >>> was: "wrap computations in a separate process.") I chose Arrow for its
> >> > >>> community and ecosystem -- in other words, because my peers chose it.
> >> > >>>
> >> > >>> I happen to use the compute kernel and Parquet capabilities every day;
> >> > but
> >> > >>> they did not sway me at all. I would choose Arrow if it were nothing
> >> > but
> >> > >>> this spec and this community. (I chose HTML, after all.)
> >> > >>>
> >> > >>> I see the *code* as one enormous proof that the *spec* is good, and as
> >> > a
> >> > >>> collection of examples and best practices.
> >> > >>>
> >> > >>> ... so a great pitch to me would be: "Apache Arrow is a data format and
> >> > >>> toolbox for efficient in-memory processing."
> >> > >>>
> >> > >>> Enjoy life,
> >> > >>> Adam
> >> > >>>
> >> > >>> On Tue, May 18, 2021 at 2:38 AM Aldrin <ak...@ucsc.edu.invalid>
> >> > wrote:
> >> > >>>
> >> > >>> "Apache Arrow is a data processing library that also provides a
> >> > uniform,
> >> > >>>
> >> > >>> efficient interface for data systems."
> >> > >>>
> >> > >>>
> >> > >>> This probably still isn't quite right, I imagine the bit about "for
> >> > data
> >> > >>>
> >> > >>> systems" needs some addition (maybe "for transport between data
> >> > systems")?
> >> > >>>
> >> > >>>
> >> > >>> My primary motivators:
> >> > >>>
> >> > >>>
> >> > >>>     - "A data processing library":
> >> > >>>
> >> > >>>        - Arrow provides many language bindings, but ultimately they're
> >> > all
> >> > >>>
> >> > >>>        part of the same "library ecosystem", which I think is fine to
> >> > >>>
> >> > >>> capture in
> >> > >>>
> >> > >>>        "library"
> >> > >>>
> >> > >>>        - A main goal of arrow is for processing to be fast, whatever
> >> > that
> >> > >>>
> >> > >>>        processing may be
> >> > >>>
> >> > >>>        - "uniform, efficient interface for data systems":
> >> > >>>
> >> > >>>        - Arrow, provides (or tries to) a cohesive ("uniform")
> >> > interface for
> >> > >>>
> >> > >>>        data processing (although it has several APIs to do this)
> >> > >>>
> >> > >>>        - Also, IMO, a motivation for arrow was a format and library to
> >> > >>>
> >> > >>>        facilitate processing, but that provided functions and
> >> > >>>
> >> > >>> interfaces to easily
> >> > >>>
> >> > >>>        translate into optimized data formats used by disparate data
> >> > systems
> >> > >>>
> >> > >>>        (cassandra, hadoop, etc.).
> >> > >>>
> >> > >>>        - Arrow tries to be transparently zero-copy, which is part of
> >> > the
> >> > >>>
> >> > >>>        interface for efficiency
> >> > >>>
> >> > >>>     - Arrow certainly has a data format, but that format is the crux
> >> > of the
> >> > >>>
> >> > >>>     interface (IMO). However, it also makes using other formats easy
> >> > (via
> >> > >>>
> >> > >>>     filesystem API and parquet reader/writers, etc.). So, focusing on
> >> > the
> >> > >>>
> >> > >>> data
> >> > >>>
> >> > >>>     format seems unnecessary in such a terse description.
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>> Aldrin Montana
> >> > >>>
> >> > >>> Computer Science PhD Student
> >> > >>>
> >> > >>> UC Santa Cruz
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>> On Mon, May 17, 2021 at 5:07 PM Weston Pace <we...@gmail.com>
> >> > wrote:
> >> > >>>
> >> > >>>
> >> > >>>> I'd avoid the word "structured" as it is somewhat ill-defined.
> >> > >>>
> >> > >>>>
> >> > >>>
> >> > >>>> On Mon, May 17, 2021 at 12:37 PM Mauricio Vargas
> >> > >>>
> >> > >>>> <ma...@ursacomputing.com> wrote:
> >> > >>>
> >> > >>>>>
> >> > >>>
> >> > >>>>> more marketed:
> >> > >>>
> >> > >>>>> How about: "Apache Arrow is a format and language-agnostic library
> >> > >>>
> >> > >>>> focused
> >> > >>>
> >> > >>>>> on efficient sharing and processing of structured data."
> >> > >>>
> >> > >>>>>
> >> > >>>
> >> > >>>>> On Mon, May 17, 2021 at 6:25 PM Micah Kornfield <
> >> > emkornfield@gmail.com
> >> > >>>
> >> > >>>>
> >> > >>>
> >> > >>>>> wrote:
> >> > >>>
> >> > >>>>>
> >> > >>>
> >> > >>>>>> How about: "Apache Arrow is a collection of specifications, cross
> >> > >>>
> >> > >>>> language
> >> > >>>
> >> > >>>>>> libraries and applications focused on efficient sharing and
> >> > >>>
> >> > >>> processing
> >> > >>>
> >> > >>>> of
> >> > >>>
> >> > >>>>>> structured data."
> >> > >>>
> >> > >>>>>>
> >> > >>>
> >> > >>>>>> On Mon, May 17, 2021 at 3:06 PM Wes McKinney <we...@gmail.com>
> >> > >>>
> >> > >>>> wrote:
> >> > >>>
> >> > >>>>>>
> >> > >>>
> >> > >>>>>>> On Mon, May 17, 2021 at 4:58 PM Weston Pace <weston.pace@gmail.com
> >> > >>>
> >> > >>>>
> >> > >>>
> >> > >>>>>> wrote:
> >> > >>>
> >> > >>>>>>>>
> >> > >>>
> >> > >>>>>>>>> “Apache Arrow is a format and compute kernel for in-memory
> >> > >>>
> >> > >>> data”
> >> > >>>
> >> > >>>>>>>>
> >> > >>>
> >> > >>>>>>>> I like this but no one ever knows what "in-memory" means (or they
> >> > >>>
> >> > >>>> just
> >> > >>>
> >> > >>>>>>>> think 'data is always in memory').  How about...
> >> > >>>
> >> > >>>>>>>>
> >> > >>>
> >> > >>>>>>>> "Apache Arrow is a format and compute kernel for zero-copy
> >> > >>>
> >> > >>>> processing
> >> > >>>
> >> > >>>>>>>> and sharing of data."
> >> > >>>
> >> > >>>>>>>>
> >> > >>>
> >> > >>>>>>>> or...
> >> > >>>
> >> > >>>>>>>>
> >> > >>>
> >> > >>>>>>>> "Apache Arrow is a format and compute kernel for processing and
> >> > >>>
> >> > >>>>>>>> sharing data without serialization overhead."
> >> > >>>
> >> > >>>>>>>
> >> > >>>
> >> > >>>>>>> A few issues with this:
> >> > >>>
> >> > >>>>>>>
> >> > >>>
> >> > >>>>>>> * Multiple PL aspect unclear (is a single piece of software, or
> >> > >>>
> >> > >>>>>>> multiple pieces of software?)
> >> > >>>
> >> > >>>>>>> * Development platform aspect unclear
> >> > >>>
> >> > >>>>>>>
> >> > >>>
> >> > >>>>>>> I see that some people don't like the word "platform". Some people
> >> > >>>
> >> > >>>>>>> come to this project and want to find an end-to-end application,
> >> > >>>
> >> > >>>>>>> rather than a developer toolkit that they can use to build
> >> > >>>
> >> > >>>>>>> applications. Perhaps we should be more explicit and use
> >> > >>>
> >> > >>>>>>> "computational development toolkit" instead of "platform".
> >> > >>>
> >> > >>>>>>>
> >> > >>>
> >> > >>>>>>>> Although marshalling[1] would probably be a more precise word it
> >> > >>>
> >> > >>> is
> >> > >>>
> >> > >>>>>>>> not as well known.
> >> > >>>
> >> > >>>>>>>>
> >> > >>>
> >> > >>>>>>>> [1] https://en.wikipedia.org/wiki/Marshalling_(computer_science)
> >> > >>>
> >> > >>>>>>>>
> >> > >>>
> >> > >>>>>>>> On Mon, May 17, 2021 at 9:36 AM Mauricio Vargas
> >> > >>>
> >> > >>>>>>>> <ma...@ursacomputing.com> wrote:
> >> > >>>
> >> > >>>>>>>>>
> >> > >>>
> >> > >>>>>>>>> a few ideas
> >> > >>>
> >> > >>>>>>>>>
> >> > >>>
> >> > >>>>>>>>> github.com/apache/arrow - Apache Arrow is an efficient library
> >> > >>>
> >> > >>>> for
> >> > >>>
> >> > >>>>>>> big data
> >> > >>>
> >> > >>>>>>>>> processing and sharing
> >> > >>>
> >> > >>>>>>>>>
> >> > >>>
> >> > >>>>>>>>> github.com/apache/arrow - Apache Arrow is a computational tool
> >> > >>>
> >> > >>>> for
> >> > >>>
> >> > >>>>>>>>> processing, storing and sharing large datasets
> >> > >>>
> >> > >>>>>>>>>
> >> > >>>
> >> > >>>>>>>>> github.com/apache/arrow - Apache Arrow is a  fast and simple
> >> > >>>
> >> > >>>> library
> >> > >>>
> >> > >>>>>>> for
> >> > >>>
> >> > >>>>>>>>> big data analytics
> >> > >>>
> >> > >>>>>>>>>
> >> > >>>
> >> > >>>>>>>>> *github.com/apache/arrow <http://github.com/apache/arrow> -
> >> > >>>
> >> > >>>> Apache
> >> > >>>
> >> > >>>>>>> Arrow is
> >> > >>>
> >> > >>>>>>>>> a powerful workhorse for analytic operations on modern
> >> > >>>
> >> > >>> hardware*
> >> > >>>
> >> > >>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>
> >> > >>>
> >> > >>>>>>>>> On Mon, May 17, 2021 at 3:13 PM Julian Hyde <
> >> > >>>
> >> > >>>> jhyde.apache@gmail.com>
> >> > >>>
> >> > >>>>>>> wrote:
> >> > >>>
> >> > >>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>> Alright, well, whatever it is, it must fit into one breath.
> >> > >>>
> >> > >>> If
> >> > >>>
> >> > >>>> the
> >> > >>>
> >> > >>>>>>>>>> high-concept pitch is successful, people will stick around
> >> > >>>
> >> > >>> for
> >> > >>>
> >> > >>>> the
> >> > >>>
> >> > >>>>>>> full
> >> > >>>
> >> > >>>>>>>>>> pitch.
> >> > >>>
> >> > >>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>> Words such as “platform” and “enable” are noise. You say
> >> > >>>
> >> > >>>>>> “platform”,
> >> > >>>
> >> > >>>>>>> they
> >> > >>>
> >> > >>>>>>>>>> start to say “what exactly do you mean by platform”, the
> >> > >>>
> >> > >>>> elevator
> >> > >>>
> >> > >>>>>>> doors
> >> > >>>
> >> > >>>>>>>>>> open, and they’re gone.
> >> > >>>
> >> > >>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>> “Apache Arrow is a format and compute kernel for in-memory
> >> > >>>
> >> > >>>> data”
> >> > >>>
> >> > >>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>> On May 17, 2021, at 12:03 PM, Eduardo Ponce <
> >> > >>>
> >> > >>>> edponce00@gmail.com
> >> > >>>
> >> > >>>>>>>
> >> > >>>
> >> > >>>>>>> wrote:
> >> > >>>
> >> > >>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>> One more suggestion for the bucket:
> >> > >>>
> >> > >>>>>>>>>>> "Apache Arrow is a computational platform for efficient
> >> > >>>
> >> > >>>> in-memory
> >> > >>>
> >> > >>>>>>> data
> >> > >>>
> >> > >>>>>>>>>>> representation and processing."
> >> > >>>
> >> > >>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>> On Mon, May 17, 2021 at 2:49 PM Wes McKinney <
> >> > >>>
> >> > >>>>>> wesmckinn@gmail.com>
> >> > >>>
> >> > >>>>>>>>>> wrote:
> >> > >>>
> >> > >>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>> I think less is better in the description, but
> >> > >>>
> >> > >>>> unfortunately the
> >> > >>>
> >> > >>>>>>>>>>>> association of Arrow as being "just a data format" has
> >> > >>>
> >> > >>> been
> >> > >>>
> >> > >>>>>>> actively
> >> > >>>
> >> > >>>>>>>>>>>> harmful in some ways to community growth. We have a data
> >> > >>>
> >> > >>>> format,
> >> > >>>
> >> > >>>>>>> yes,
> >> > >>>
> >> > >>>>>>>>>>>> but we are also creating a computational platform to go
> >> > >>>
> >> > >>>>>>> hand-in-hand
> >> > >>>
> >> > >>>>>>>>>>>> with the data format to make it easier to build fast
> >> > >>>
> >> > >>>>>> applications
> >> > >>>
> >> > >>>>>>> that
> >> > >>>
> >> > >>>>>>>>>>>> use the data format. So the description needs to capture
> >> > >>>
> >> > >>>> both of
> >> > >>>
> >> > >>>>>>> these
> >> > >>>
> >> > >>>>>>>>>>>> ideas.
> >> > >>>
> >> > >>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>> On Mon, May 17, 2021 at 12:15 PM Julian Hyde <
> >> > >>>
> >> > >>>>>>> jhyde.apache@gmail.com>
> >> > >>>
> >> > >>>>>>>>>>>> wrote:
> >> > >>>
> >> > >>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>> I think that the “cross-language development platform
> >> > >>>
> >> > >>> for”
> >> > >>>
> >> > >>>> is
> >> > >>>
> >> > >>>>>>> noise.
> >> > >>>
> >> > >>>>>>>>>>>> (I’m sure that JPEG developers think that JPEG is a
> >> > >>>
> >> > >>>>>>> “cross-language
> >> > >>>
> >> > >>>>>>>>>>>> development platform” too. But it isn’t. It is an image
> >> > >>>
> >> > >>>> format.)
> >> > >>>
> >> > >>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>> "Apache Arrow is data format for efficient in-memory
> >> > >>>
> >> > >>>>>> processing.”
> >> > >>>
> >> > >>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>> I’ll note that In marketing speak, we are developing a
> >> > >>>
> >> > >>>>>>> high-concept
> >> > >>>
> >> > >>>>>>>>>>>> pitch [1] here. Every company needs a name, a brand, a
> >> > >>>
> >> > >>>>>>> high-concept
> >> > >>>
> >> > >>>>>>>>>> pitch,
> >> > >>>
> >> > >>>>>>>>>>>> and 3- or 4-sentence description. But every Apache project
> >> > >>>
> >> > >>>> needs
> >> > >>>
> >> > >>>>>>> these
> >> > >>>
> >> > >>>>>>>>>> too.
> >> > >>>
> >> > >>>>>>>>>>>> It’s worth spending the time on the description, also, and
> >> > >>>
> >> > >>>> then
> >> > >>>
> >> > >>>>>>> use
> >> > >>>
> >> > >>>>>>>>>> them in
> >> > >>>
> >> > >>>>>>>>>>>> all the places that we describe Arrow.
> >> > >>>
> >> > >>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>> Julian
> >> > >>>
> >> > >>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>> [1]
> >> > >>>
> >> > >>>>>>> https://www.growthink.com/content/whats-your-high-concept-pitch
> >> > >>>
> >> > >>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>> On May 17, 2021, at 7:38 AM, Eduardo Ponce <
> >> > >>>
> >> > >>>>>> edponce00@gmail.com
> >> > >>>
> >> > >>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>> wrote:
> >> > >>>
> >> > >>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>> I agree with Nate's and Brian's suggestions, but would
> >> > >>>
> >> > >>>> like to
> >> > >>>
> >> > >>>>>>> add
> >> > >>>
> >> > >>>>>>>>>>>> that we
> >> > >>>
> >> > >>>>>>>>>>>>>> can make it a one-liner for more conciseness and
> >> > >>>
> >> > >>>> consistency
> >> > >>>
> >> > >>>>>>> with
> >> > >>>
> >> > >>>>>>>>>> other
> >> > >>>
> >> > >>>>>>>>>>>>>> Apache projects.
> >> > >>>
> >> > >>>>>>>>>>>>>> Apologies if it seems I am going around the suggestions
> >> > >>>
> >> > >>>> loop
> >> > >>>
> >> > >>>>>>> again.
> >> > >>>
> >> > >>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>> "Apache Arrow is a cross-language development platform
> >> > >>>
> >> > >>>>>> enabling
> >> > >>>
> >> > >>>>>>>>>>>> efficient
> >> > >>>
> >> > >>>>>>>>>>>>>> in-memory data processing and transport."
> >> > >>>
> >> > >>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>> On Mon, May 17, 2021 at 10:11 AM Brian Hulette <
> >> > >>>
> >> > >>>>>>> bhulette@apache.org>
> >> > >>>
> >> > >>>>>>>>>>>> wrote:
> >> > >>>
> >> > >>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>> Thank you for bringing this up Dominik. I sampled some
> >> > >>>
> >> > >>>> of the
> >> > >>>
> >> > >>>>>>>>>>>> descriptions
> >> > >>>
> >> > >>>>>>>>>>>>>>> for other Apache projects I frequent, the ones with a
> >> > >>>
> >> > >>>>>>> meaningful
> >> > >>>
> >> > >>>>>>>>>>>>>>> description have a single sentence:
> >> > >>>
> >> > >>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>> github.com/apache/spark - Apache Spark - A unified
> >> > >>>
> >> > >>>> analytics
> >> > >>>
> >> > >>>>>>> engine
> >> > >>>
> >> > >>>>>>>>>>>> for
> >> > >>>
> >> > >>>>>>>>>>>>>>> large-scale data processing
> >> > >>>
> >> > >>>>>>>>>>>>>>> github.com/apache/beam - Apache Beam is a unified
> >> > >>>
> >> > >>>>>> programming
> >> > >>>
> >> > >>>>>>> model
> >> > >>>
> >> > >>>>>>>>>>>> for
> >> > >>>
> >> > >>>>>>>>>>>>>>> Batch and Streaming
> >> > >>>
> >> > >>>>>>>>>>>>>>> github.com/apache/avro - Apache Avro is a data
> >> > >>>
> >> > >>>> serialization
> >> > >>>
> >> > >>>>>>> system
> >> > >>>
> >> > >>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>> Several others (Flink, Hadoop, ...) just have  "[Mirror
> >> > >>>
> >> > >>>> of]
> >> > >>>
> >> > >>>>>>> Apache
> >> > >>>
> >> > >>>>>>>>>>>> <name>"
> >> > >>>
> >> > >>>>>>>>>>>>>>> as the description.
> >> > >>>
> >> > >>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>> +1 for Nate's suggestion "Apache Arrow is a
> >> > >>>
> >> > >>>> cross-language
> >> > >>>
> >> > >>>>>>>>>> development
> >> > >>>
> >> > >>>>>>>>>>>>>>> platform for in-memory data. It enables systems to
> >> > >>>
> >> > >>>> process
> >> > >>>
> >> > >>>>>> and
> >> > >>>
> >> > >>>>>>>>>>>> transport
> >> > >>>
> >> > >>>>>>>>>>>>>>> data more efficiently."
> >> > >>>
> >> > >>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>> On Mon, May 17, 2021 at 5:23 AM Wes McKinney <
> >> > >>>
> >> > >>>>>>> wesmckinn@gmail.com>
> >> > >>>
> >> > >>>>>>>>>>>> wrote:
> >> > >>>
> >> > >>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>> It's probably best for description to limit mentions
> >> > >>>
> >> > >>> of
> >> > >>>
> >> > >>>>>>> specific
> >> > >>>
> >> > >>>>>>>>>>>>>>>> features. There are some high level features mentioned
> >> > >>>
> >> > >>>> in
> >> > >>>
> >> > >>>>>> the
> >> > >>>
> >> > >>>>>>>>>>>>>>>> description now ("computational libraries and
> >> > >>>
> >> > >>> zero-copy
> >> > >>>
> >> > >>>>>>> streaming
> >> > >>>
> >> > >>>>>>>>>>>>>>>> messaging and interprocess communication"), but now in
> >> > >>>
> >> > >>>> 2021
> >> > >>>
> >> > >>>>>>> since
> >> > >>>
> >> > >>>>>>>>>> the
> >> > >>>
> >> > >>>>>>>>>>>>>>>> project has grown so much, it could leave people with
> >> > >>>
> >> > >>> a
> >> > >>>
> >> > >>>>>>> limited view
> >> > >>>
> >> > >>>>>>>>>>>>>>>> of what they might find here.
> >> > >>>
> >> > >>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>> On Mon, May 17, 2021 at 12:14 AM Mauricio Vargas
> >> > >>>
> >> > >>>>>>>>>>>>>>>> <ma...@ursacomputing.com> wrote:
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>> How about
> >> > >>>
> >> > >>>>>>>>>>>>>>>>> 'Apache Arrow is a cross-language development
> >> > >>>
> >> > >>> platform
> >> > >>>
> >> > >>>> for
> >> > >>>
> >> > >>>>>>>>>> in-memory
> >> > >>>
> >> > >>>>>>>>>>>>>>>> data.
> >> > >>>
> >> > >>>>>>>>>>>>>>>>> It enables systems to process and transport data
> >> > >>>
> >> > >>>>>> efficiently,
> >> > >>>
> >> > >>>>>>>>>>>>>>> providing a
> >> > >>>
> >> > >>>>>>>>>>>>>>>>> simple and fast library for partitioning of large
> >> > >>>
> >> > >>>> tables'?
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>> Sorry the delay, long election day
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>> On Sun, May 16, 2021, 2:27 PM Nate Bauernfeind <
> >> > >>>
> >> > >>>>>>>>>>>>>>>> natebauernfeind@deephaven.io>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>> wrote:
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>> Suggestion: faster -> more efficiently
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>> "Apache Arrow is a cross-language development
> >> > >>>
> >> > >>>> platform for
> >> > >>>
> >> > >>>>>>>>>>>> in-memory
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>> data. It enables systems to process and transport
> >> > >>>
> >> > >>> data
> >> > >>>
> >> > >>>>>> more
> >> > >>>
> >> > >>>>>>>>>>>>>>>> efficiently."
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>> On Sun, May 16, 2021 at 11:35 AM Wes McKinney <
> >> > >>>
> >> > >>>>>>>>>> wesmckinn@gmail.com
> >> > >>>
> >> > >>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>> wrote:
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>> Here's what there now:
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>> "Apache Arrow is a cross-language development
> >> > >>>
> >> > >>>> platform
> >> > >>>
> >> > >>>>>> for
> >> > >>>
> >> > >>>>>>>>>>>>>>> in-memory
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>> data. It specifies a standardized
> >> > >>>
> >> > >>>> language-independent
> >> > >>>
> >> > >>>>>>> columnar
> >> > >>>
> >> > >>>>>>>>>>>>>>>> memory
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>> format for flat and hierarchical data, organized
> >> > >>>
> >> > >>> for
> >> > >>>
> >> > >>>>>>> efficient
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>> analytic operations on modern hardware. It also
> >> > >>>
> >> > >>>> provides
> >> > >>>
> >> > >>>>>>>>>>>>>>>> computational
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>> libraries and zero-copy streaming messaging and
> >> > >>>
> >> > >>>>>>> interprocess
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>> communication…"
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>> How about something shorter like
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>> "Apache Arrow is a cross-language development
> >> > >>>
> >> > >>>> platform
> >> > >>>
> >> > >>>>>> for
> >> > >>>
> >> > >>>>>>>>>>>>>>> in-memory
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>> data. It enables systems to process and transport
> >> > >>>
> >> > >>>> data
> >> > >>>
> >> > >>>>>>> faster."
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>> Suggestions / refinements from others welcome
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>> On Sat, May 15, 2021 at 9:12 PM Dominik Moritz <
> >> > >>>
> >> > >>>>>>> domoritz@cmu.edu
> >> > >>>
> >> > >>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>> wrote:
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>>> Super minor issue but could someone make the
> >> > >>>
> >> > >>>> description
> >> > >>>
> >> > >>>>>>> on
> >> > >>>
> >> > >>>>>>>>>>>>>>> GitHub
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>> shorter?
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>>> GitHub puts the description into the title of the
> >> > >>>
> >> > >>>> page
> >> > >>>
> >> > >>>>>>> and makes
> >> > >>>
> >> > >>>>>>>>>>>>>>> it
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>> hard
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>> to find it in URL autocomplete.
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>> --
> >> > >>>
> >> > >>>>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>
> >> > >>>
> >> > >>>>>>>>>>
> >> > >>>
> >> > >>>>>>>
> >> > >>>
> >> > >>>>>>
> >> > >>>
> >> > >>>>
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>> --
> >> > >>> Adam Hooper
> >> > >>> +1-514-882-9694
> >> > >>> http://adamhooper.com
> >> > >>>
> >> >

Re: Long title on github page

Posted by Sutou Kouhei <ko...@clear-code.com>.
It seems that we can use .asf.yaml to set the description on
GitHub:

https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features#Git.asf.yamlfeatures-GitHubsettings

github:
  description: "Apache Arrow is ..."

In <CA...@mail.gmail.com>
  "Re: Long title on github page" on Thu, 10 Jun 2021 17:44:57 -0500,
  Wes McKinney <we...@gmail.com> wrote:

> I'll wait a day or two for more feedback to percolate and then ask
> Infra to change the description on GitHub.
> 
> On Thu, Jun 10, 2021 at 4:47 PM Adam Lippai <ad...@rigo.sk> wrote:
>>
>> +1
>>
>> On Thu, Jun 10, 2021, 23:38 Antoine Pitrou <an...@python.org> wrote:
>>
>> >
>> > Sound good enough to me.
>> >
>> >
>> > Le 10/06/2021 à 23:35, Wes McKinney a écrit :
>> > > I hate to reopen this can of worms again, but here is my effort to
>> > > synthesize feedback:
>> > >
>> > > "Apache Arrow is a multi-language toolbox for accelerated data
>> > > interchange and in-memory processing."
>> > >
>> > > On Thu, Jun 10, 2021 at 12:37 PM Dominik Moritz <do...@apache.org>
>> > wrote:
>> > >>
>> > >> I thought there were some good suggestions in this thread. @Wes, did you
>> > >> find a description you liked?
>> > >>
>> > >> On May 18, 2021 at 06:24:47, Adam Hooper <ad...@adamhooper.com> wrote:
>> > >>
>> > >>> Poll question: why did you choose Arrow?
>> > >>>
>> > >>> Personally: I researched Arrow because it's a spec for IPC. (My
>> > requirement
>> > >>> was: "wrap computations in a separate process.") I chose Arrow for its
>> > >>> community and ecosystem -- in other words, because my peers chose it.
>> > >>>
>> > >>> I happen to use the compute kernel and Parquet capabilities every day;
>> > but
>> > >>> they did not sway me at all. I would choose Arrow if it were nothing
>> > but
>> > >>> this spec and this community. (I chose HTML, after all.)
>> > >>>
>> > >>> I see the *code* as one enormous proof that the *spec* is good, and as
>> > a
>> > >>> collection of examples and best practices.
>> > >>>
>> > >>> ... so a great pitch to me would be: "Apache Arrow is a data format and
>> > >>> toolbox for efficient in-memory processing."
>> > >>>
>> > >>> Enjoy life,
>> > >>> Adam
>> > >>>
>> > >>> On Tue, May 18, 2021 at 2:38 AM Aldrin <ak...@ucsc.edu.invalid>
>> > wrote:
>> > >>>
>> > >>> "Apache Arrow is a data processing library that also provides a
>> > uniform,
>> > >>>
>> > >>> efficient interface for data systems."
>> > >>>
>> > >>>
>> > >>> This probably still isn't quite right, I imagine the bit about "for
>> > data
>> > >>>
>> > >>> systems" needs some addition (maybe "for transport between data
>> > systems")?
>> > >>>
>> > >>>
>> > >>> My primary motivators:
>> > >>>
>> > >>>
>> > >>>     - "A data processing library":
>> > >>>
>> > >>>        - Arrow provides many language bindings, but ultimately they're
>> > all
>> > >>>
>> > >>>        part of the same "library ecosystem", which I think is fine to
>> > >>>
>> > >>> capture in
>> > >>>
>> > >>>        "library"
>> > >>>
>> > >>>        - A main goal of arrow is for processing to be fast, whatever
>> > that
>> > >>>
>> > >>>        processing may be
>> > >>>
>> > >>>        - "uniform, efficient interface for data systems":
>> > >>>
>> > >>>        - Arrow, provides (or tries to) a cohesive ("uniform")
>> > interface for
>> > >>>
>> > >>>        data processing (although it has several APIs to do this)
>> > >>>
>> > >>>        - Also, IMO, a motivation for arrow was a format and library to
>> > >>>
>> > >>>        facilitate processing, but that provided functions and
>> > >>>
>> > >>> interfaces to easily
>> > >>>
>> > >>>        translate into optimized data formats used by disparate data
>> > systems
>> > >>>
>> > >>>        (cassandra, hadoop, etc.).
>> > >>>
>> > >>>        - Arrow tries to be transparently zero-copy, which is part of
>> > the
>> > >>>
>> > >>>        interface for efficiency
>> > >>>
>> > >>>     - Arrow certainly has a data format, but that format is the crux
>> > of the
>> > >>>
>> > >>>     interface (IMO). However, it also makes using other formats easy
>> > (via
>> > >>>
>> > >>>     filesystem API and parquet reader/writers, etc.). So, focusing on
>> > the
>> > >>>
>> > >>> data
>> > >>>
>> > >>>     format seems unnecessary in such a terse description.
>> > >>>
>> > >>>
>> > >>>
>> > >>> Aldrin Montana
>> > >>>
>> > >>> Computer Science PhD Student
>> > >>>
>> > >>> UC Santa Cruz
>> > >>>
>> > >>>
>> > >>>
>> > >>> On Mon, May 17, 2021 at 5:07 PM Weston Pace <we...@gmail.com>
>> > wrote:
>> > >>>
>> > >>>
>> > >>>> I'd avoid the word "structured" as it is somewhat ill-defined.
>> > >>>
>> > >>>>
>> > >>>
>> > >>>> On Mon, May 17, 2021 at 12:37 PM Mauricio Vargas
>> > >>>
>> > >>>> <ma...@ursacomputing.com> wrote:
>> > >>>
>> > >>>>>
>> > >>>
>> > >>>>> more marketed:
>> > >>>
>> > >>>>> How about: "Apache Arrow is a format and language-agnostic library
>> > >>>
>> > >>>> focused
>> > >>>
>> > >>>>> on efficient sharing and processing of structured data."
>> > >>>
>> > >>>>>
>> > >>>
>> > >>>>> On Mon, May 17, 2021 at 6:25 PM Micah Kornfield <
>> > emkornfield@gmail.com
>> > >>>
>> > >>>>
>> > >>>
>> > >>>>> wrote:
>> > >>>
>> > >>>>>
>> > >>>
>> > >>>>>> How about: "Apache Arrow is a collection of specifications, cross
>> > >>>
>> > >>>> language
>> > >>>
>> > >>>>>> libraries and applications focused on efficient sharing and
>> > >>>
>> > >>> processing
>> > >>>
>> > >>>> of
>> > >>>
>> > >>>>>> structured data."
>> > >>>
>> > >>>>>>
>> > >>>
>> > >>>>>> On Mon, May 17, 2021 at 3:06 PM Wes McKinney <we...@gmail.com>
>> > >>>
>> > >>>> wrote:
>> > >>>
>> > >>>>>>
>> > >>>
>> > >>>>>>> On Mon, May 17, 2021 at 4:58 PM Weston Pace <weston.pace@gmail.com
>> > >>>
>> > >>>>
>> > >>>
>> > >>>>>> wrote:
>> > >>>
>> > >>>>>>>>
>> > >>>
>> > >>>>>>>>> “Apache Arrow is a format and compute kernel for in-memory
>> > >>>
>> > >>> data”
>> > >>>
>> > >>>>>>>>
>> > >>>
>> > >>>>>>>> I like this but no one ever knows what "in-memory" means (or they
>> > >>>
>> > >>>> just
>> > >>>
>> > >>>>>>>> think 'data is always in memory').  How about...
>> > >>>
>> > >>>>>>>>
>> > >>>
>> > >>>>>>>> "Apache Arrow is a format and compute kernel for zero-copy
>> > >>>
>> > >>>> processing
>> > >>>
>> > >>>>>>>> and sharing of data."
>> > >>>
>> > >>>>>>>>
>> > >>>
>> > >>>>>>>> or...
>> > >>>
>> > >>>>>>>>
>> > >>>
>> > >>>>>>>> "Apache Arrow is a format and compute kernel for processing and
>> > >>>
>> > >>>>>>>> sharing data without serialization overhead."
>> > >>>
>> > >>>>>>>
>> > >>>
>> > >>>>>>> A few issues with this:
>> > >>>
>> > >>>>>>>
>> > >>>
>> > >>>>>>> * Multiple PL aspect unclear (is a single piece of software, or
>> > >>>
>> > >>>>>>> multiple pieces of software?)
>> > >>>
>> > >>>>>>> * Development platform aspect unclear
>> > >>>
>> > >>>>>>>
>> > >>>
>> > >>>>>>> I see that some people don't like the word "platform". Some people
>> > >>>
>> > >>>>>>> come to this project and want to find an end-to-end application,
>> > >>>
>> > >>>>>>> rather than a developer toolkit that they can use to build
>> > >>>
>> > >>>>>>> applications. Perhaps we should be more explicit and use
>> > >>>
>> > >>>>>>> "computational development toolkit" instead of "platform".
>> > >>>
>> > >>>>>>>
>> > >>>
>> > >>>>>>>> Although marshalling[1] would probably be a more precise word it
>> > >>>
>> > >>> is
>> > >>>
>> > >>>>>>>> not as well known.
>> > >>>
>> > >>>>>>>>
>> > >>>
>> > >>>>>>>> [1] https://en.wikipedia.org/wiki/Marshalling_(computer_science)
>> > >>>
>> > >>>>>>>>
>> > >>>
>> > >>>>>>>> On Mon, May 17, 2021 at 9:36 AM Mauricio Vargas
>> > >>>
>> > >>>>>>>> <ma...@ursacomputing.com> wrote:
>> > >>>
>> > >>>>>>>>>
>> > >>>
>> > >>>>>>>>> a few ideas
>> > >>>
>> > >>>>>>>>>
>> > >>>
>> > >>>>>>>>> github.com/apache/arrow - Apache Arrow is an efficient library
>> > >>>
>> > >>>> for
>> > >>>
>> > >>>>>>> big data
>> > >>>
>> > >>>>>>>>> processing and sharing
>> > >>>
>> > >>>>>>>>>
>> > >>>
>> > >>>>>>>>> github.com/apache/arrow - Apache Arrow is a computational tool
>> > >>>
>> > >>>> for
>> > >>>
>> > >>>>>>>>> processing, storing and sharing large datasets
>> > >>>
>> > >>>>>>>>>
>> > >>>
>> > >>>>>>>>> github.com/apache/arrow - Apache Arrow is a  fast and simple
>> > >>>
>> > >>>> library
>> > >>>
>> > >>>>>>> for
>> > >>>
>> > >>>>>>>>> big data analytics
>> > >>>
>> > >>>>>>>>>
>> > >>>
>> > >>>>>>>>> *github.com/apache/arrow <http://github.com/apache/arrow> -
>> > >>>
>> > >>>> Apache
>> > >>>
>> > >>>>>>> Arrow is
>> > >>>
>> > >>>>>>>>> a powerful workhorse for analytic operations on modern
>> > >>>
>> > >>> hardware*
>> > >>>
>> > >>>>>>>>>
>> > >>>
>> > >>>>>>>>>
>> > >>>
>> > >>>>>>>>> On Mon, May 17, 2021 at 3:13 PM Julian Hyde <
>> > >>>
>> > >>>> jhyde.apache@gmail.com>
>> > >>>
>> > >>>>>>> wrote:
>> > >>>
>> > >>>>>>>>>
>> > >>>
>> > >>>>>>>>>> Alright, well, whatever it is, it must fit into one breath.
>> > >>>
>> > >>> If
>> > >>>
>> > >>>> the
>> > >>>
>> > >>>>>>>>>> high-concept pitch is successful, people will stick around
>> > >>>
>> > >>> for
>> > >>>
>> > >>>> the
>> > >>>
>> > >>>>>>> full
>> > >>>
>> > >>>>>>>>>> pitch.
>> > >>>
>> > >>>>>>>>>>
>> > >>>
>> > >>>>>>>>>> Words such as “platform” and “enable” are noise. You say
>> > >>>
>> > >>>>>> “platform”,
>> > >>>
>> > >>>>>>> they
>> > >>>
>> > >>>>>>>>>> start to say “what exactly do you mean by platform”, the
>> > >>>
>> > >>>> elevator
>> > >>>
>> > >>>>>>> doors
>> > >>>
>> > >>>>>>>>>> open, and they’re gone.
>> > >>>
>> > >>>>>>>>>>
>> > >>>
>> > >>>>>>>>>> “Apache Arrow is a format and compute kernel for in-memory
>> > >>>
>> > >>>> data”
>> > >>>
>> > >>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>> On May 17, 2021, at 12:03 PM, Eduardo Ponce <
>> > >>>
>> > >>>> edponce00@gmail.com
>> > >>>
>> > >>>>>>>
>> > >>>
>> > >>>>>>> wrote:
>> > >>>
>> > >>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>> One more suggestion for the bucket:
>> > >>>
>> > >>>>>>>>>>> "Apache Arrow is a computational platform for efficient
>> > >>>
>> > >>>> in-memory
>> > >>>
>> > >>>>>>> data
>> > >>>
>> > >>>>>>>>>>> representation and processing."
>> > >>>
>> > >>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>> On Mon, May 17, 2021 at 2:49 PM Wes McKinney <
>> > >>>
>> > >>>>>> wesmckinn@gmail.com>
>> > >>>
>> > >>>>>>>>>> wrote:
>> > >>>
>> > >>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>> I think less is better in the description, but
>> > >>>
>> > >>>> unfortunately the
>> > >>>
>> > >>>>>>>>>>>> association of Arrow as being "just a data format" has
>> > >>>
>> > >>> been
>> > >>>
>> > >>>>>>> actively
>> > >>>
>> > >>>>>>>>>>>> harmful in some ways to community growth. We have a data
>> > >>>
>> > >>>> format,
>> > >>>
>> > >>>>>>> yes,
>> > >>>
>> > >>>>>>>>>>>> but we are also creating a computational platform to go
>> > >>>
>> > >>>>>>> hand-in-hand
>> > >>>
>> > >>>>>>>>>>>> with the data format to make it easier to build fast
>> > >>>
>> > >>>>>> applications
>> > >>>
>> > >>>>>>> that
>> > >>>
>> > >>>>>>>>>>>> use the data format. So the description needs to capture
>> > >>>
>> > >>>> both of
>> > >>>
>> > >>>>>>> these
>> > >>>
>> > >>>>>>>>>>>> ideas.
>> > >>>
>> > >>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>> On Mon, May 17, 2021 at 12:15 PM Julian Hyde <
>> > >>>
>> > >>>>>>> jhyde.apache@gmail.com>
>> > >>>
>> > >>>>>>>>>>>> wrote:
>> > >>>
>> > >>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>> I think that the “cross-language development platform
>> > >>>
>> > >>> for”
>> > >>>
>> > >>>> is
>> > >>>
>> > >>>>>>> noise.
>> > >>>
>> > >>>>>>>>>>>> (I’m sure that JPEG developers think that JPEG is a
>> > >>>
>> > >>>>>>> “cross-language
>> > >>>
>> > >>>>>>>>>>>> development platform” too. But it isn’t. It is an image
>> > >>>
>> > >>>> format.)
>> > >>>
>> > >>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>> "Apache Arrow is data format for efficient in-memory
>> > >>>
>> > >>>>>> processing.”
>> > >>>
>> > >>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>> I’ll note that In marketing speak, we are developing a
>> > >>>
>> > >>>>>>> high-concept
>> > >>>
>> > >>>>>>>>>>>> pitch [1] here. Every company needs a name, a brand, a
>> > >>>
>> > >>>>>>> high-concept
>> > >>>
>> > >>>>>>>>>> pitch,
>> > >>>
>> > >>>>>>>>>>>> and 3- or 4-sentence description. But every Apache project
>> > >>>
>> > >>>> needs
>> > >>>
>> > >>>>>>> these
>> > >>>
>> > >>>>>>>>>> too.
>> > >>>
>> > >>>>>>>>>>>> It’s worth spending the time on the description, also, and
>> > >>>
>> > >>>> then
>> > >>>
>> > >>>>>>> use
>> > >>>
>> > >>>>>>>>>> them in
>> > >>>
>> > >>>>>>>>>>>> all the places that we describe Arrow.
>> > >>>
>> > >>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>> Julian
>> > >>>
>> > >>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>> [1]
>> > >>>
>> > >>>>>>> https://www.growthink.com/content/whats-your-high-concept-pitch
>> > >>>
>> > >>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>> On May 17, 2021, at 7:38 AM, Eduardo Ponce <
>> > >>>
>> > >>>>>> edponce00@gmail.com
>> > >>>
>> > >>>>>>>>
>> > >>>
>> > >>>>>>>>>>>> wrote:
>> > >>>
>> > >>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>> I agree with Nate's and Brian's suggestions, but would
>> > >>>
>> > >>>> like to
>> > >>>
>> > >>>>>>> add
>> > >>>
>> > >>>>>>>>>>>> that we
>> > >>>
>> > >>>>>>>>>>>>>> can make it a one-liner for more conciseness and
>> > >>>
>> > >>>> consistency
>> > >>>
>> > >>>>>>> with
>> > >>>
>> > >>>>>>>>>> other
>> > >>>
>> > >>>>>>>>>>>>>> Apache projects.
>> > >>>
>> > >>>>>>>>>>>>>> Apologies if it seems I am going around the suggestions
>> > >>>
>> > >>>> loop
>> > >>>
>> > >>>>>>> again.
>> > >>>
>> > >>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>> "Apache Arrow is a cross-language development platform
>> > >>>
>> > >>>>>> enabling
>> > >>>
>> > >>>>>>>>>>>> efficient
>> > >>>
>> > >>>>>>>>>>>>>> in-memory data processing and transport."
>> > >>>
>> > >>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>> On Mon, May 17, 2021 at 10:11 AM Brian Hulette <
>> > >>>
>> > >>>>>>> bhulette@apache.org>
>> > >>>
>> > >>>>>>>>>>>> wrote:
>> > >>>
>> > >>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>> Thank you for bringing this up Dominik. I sampled some
>> > >>>
>> > >>>> of the
>> > >>>
>> > >>>>>>>>>>>> descriptions
>> > >>>
>> > >>>>>>>>>>>>>>> for other Apache projects I frequent, the ones with a
>> > >>>
>> > >>>>>>> meaningful
>> > >>>
>> > >>>>>>>>>>>>>>> description have a single sentence:
>> > >>>
>> > >>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>> github.com/apache/spark - Apache Spark - A unified
>> > >>>
>> > >>>> analytics
>> > >>>
>> > >>>>>>> engine
>> > >>>
>> > >>>>>>>>>>>> for
>> > >>>
>> > >>>>>>>>>>>>>>> large-scale data processing
>> > >>>
>> > >>>>>>>>>>>>>>> github.com/apache/beam - Apache Beam is a unified
>> > >>>
>> > >>>>>> programming
>> > >>>
>> > >>>>>>> model
>> > >>>
>> > >>>>>>>>>>>> for
>> > >>>
>> > >>>>>>>>>>>>>>> Batch and Streaming
>> > >>>
>> > >>>>>>>>>>>>>>> github.com/apache/avro - Apache Avro is a data
>> > >>>
>> > >>>> serialization
>> > >>>
>> > >>>>>>> system
>> > >>>
>> > >>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>> Several others (Flink, Hadoop, ...) just have  "[Mirror
>> > >>>
>> > >>>> of]
>> > >>>
>> > >>>>>>> Apache
>> > >>>
>> > >>>>>>>>>>>> <name>"
>> > >>>
>> > >>>>>>>>>>>>>>> as the description.
>> > >>>
>> > >>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>> +1 for Nate's suggestion "Apache Arrow is a
>> > >>>
>> > >>>> cross-language
>> > >>>
>> > >>>>>>>>>> development
>> > >>>
>> > >>>>>>>>>>>>>>> platform for in-memory data. It enables systems to
>> > >>>
>> > >>>> process
>> > >>>
>> > >>>>>> and
>> > >>>
>> > >>>>>>>>>>>> transport
>> > >>>
>> > >>>>>>>>>>>>>>> data more efficiently."
>> > >>>
>> > >>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>> On Mon, May 17, 2021 at 5:23 AM Wes McKinney <
>> > >>>
>> > >>>>>>> wesmckinn@gmail.com>
>> > >>>
>> > >>>>>>>>>>>> wrote:
>> > >>>
>> > >>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>> It's probably best for description to limit mentions
>> > >>>
>> > >>> of
>> > >>>
>> > >>>>>>> specific
>> > >>>
>> > >>>>>>>>>>>>>>>> features. There are some high level features mentioned
>> > >>>
>> > >>>> in
>> > >>>
>> > >>>>>> the
>> > >>>
>> > >>>>>>>>>>>>>>>> description now ("computational libraries and
>> > >>>
>> > >>> zero-copy
>> > >>>
>> > >>>>>>> streaming
>> > >>>
>> > >>>>>>>>>>>>>>>> messaging and interprocess communication"), but now in
>> > >>>
>> > >>>> 2021
>> > >>>
>> > >>>>>>> since
>> > >>>
>> > >>>>>>>>>> the
>> > >>>
>> > >>>>>>>>>>>>>>>> project has grown so much, it could leave people with
>> > >>>
>> > >>> a
>> > >>>
>> > >>>>>>> limited view
>> > >>>
>> > >>>>>>>>>>>>>>>> of what they might find here.
>> > >>>
>> > >>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>> On Mon, May 17, 2021 at 12:14 AM Mauricio Vargas
>> > >>>
>> > >>>>>>>>>>>>>>>> <ma...@ursacomputing.com> wrote:
>> > >>>
>> > >>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>> How about
>> > >>>
>> > >>>>>>>>>>>>>>>>> 'Apache Arrow is a cross-language development
>> > >>>
>> > >>> platform
>> > >>>
>> > >>>> for
>> > >>>
>> > >>>>>>>>>> in-memory
>> > >>>
>> > >>>>>>>>>>>>>>>> data.
>> > >>>
>> > >>>>>>>>>>>>>>>>> It enables systems to process and transport data
>> > >>>
>> > >>>>>> efficiently,
>> > >>>
>> > >>>>>>>>>>>>>>> providing a
>> > >>>
>> > >>>>>>>>>>>>>>>>> simple and fast library for partitioning of large
>> > >>>
>> > >>>> tables'?
>> > >>>
>> > >>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>> Sorry the delay, long election day
>> > >>>
>> > >>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>> On Sun, May 16, 2021, 2:27 PM Nate Bauernfeind <
>> > >>>
>> > >>>>>>>>>>>>>>>> natebauernfeind@deephaven.io>
>> > >>>
>> > >>>>>>>>>>>>>>>>> wrote:
>> > >>>
>> > >>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>>> Suggestion: faster -> more efficiently
>> > >>>
>> > >>>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>>> "Apache Arrow is a cross-language development
>> > >>>
>> > >>>> platform for
>> > >>>
>> > >>>>>>>>>>>> in-memory
>> > >>>
>> > >>>>>>>>>>>>>>>>>> data. It enables systems to process and transport
>> > >>>
>> > >>> data
>> > >>>
>> > >>>>>> more
>> > >>>
>> > >>>>>>>>>>>>>>>> efficiently."
>> > >>>
>> > >>>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>>> On Sun, May 16, 2021 at 11:35 AM Wes McKinney <
>> > >>>
>> > >>>>>>>>>> wesmckinn@gmail.com
>> > >>>
>> > >>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>> wrote:
>> > >>>
>> > >>>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>>>> Here's what there now:
>> > >>>
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>>>> "Apache Arrow is a cross-language development
>> > >>>
>> > >>>> platform
>> > >>>
>> > >>>>>> for
>> > >>>
>> > >>>>>>>>>>>>>>> in-memory
>> > >>>
>> > >>>>>>>>>>>>>>>>>>> data. It specifies a standardized
>> > >>>
>> > >>>> language-independent
>> > >>>
>> > >>>>>>> columnar
>> > >>>
>> > >>>>>>>>>>>>>>>> memory
>> > >>>
>> > >>>>>>>>>>>>>>>>>>> format for flat and hierarchical data, organized
>> > >>>
>> > >>> for
>> > >>>
>> > >>>>>>> efficient
>> > >>>
>> > >>>>>>>>>>>>>>>>>>> analytic operations on modern hardware. It also
>> > >>>
>> > >>>> provides
>> > >>>
>> > >>>>>>>>>>>>>>>> computational
>> > >>>
>> > >>>>>>>>>>>>>>>>>>> libraries and zero-copy streaming messaging and
>> > >>>
>> > >>>>>>> interprocess
>> > >>>
>> > >>>>>>>>>>>>>>>>>>> communication…"
>> > >>>
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>>>> How about something shorter like
>> > >>>
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>>>> "Apache Arrow is a cross-language development
>> > >>>
>> > >>>> platform
>> > >>>
>> > >>>>>> for
>> > >>>
>> > >>>>>>>>>>>>>>> in-memory
>> > >>>
>> > >>>>>>>>>>>>>>>>>>> data. It enables systems to process and transport
>> > >>>
>> > >>>> data
>> > >>>
>> > >>>>>>> faster."
>> > >>>
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>>>> Suggestions / refinements from others welcome
>> > >>>
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>>>> On Sat, May 15, 2021 at 9:12 PM Dominik Moritz <
>> > >>>
>> > >>>>>>> domoritz@cmu.edu
>> > >>>
>> > >>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>> wrote:
>> > >>>
>> > >>>>>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>>>>> Super minor issue but could someone make the
>> > >>>
>> > >>>> description
>> > >>>
>> > >>>>>>> on
>> > >>>
>> > >>>>>>>>>>>>>>> GitHub
>> > >>>
>> > >>>>>>>>>>>>>>>>>>> shorter?
>> > >>>
>> > >>>>>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>>>>> GitHub puts the description into the title of the
>> > >>>
>> > >>>> page
>> > >>>
>> > >>>>>>> and makes
>> > >>>
>> > >>>>>>>>>>>>>>> it
>> > >>>
>> > >>>>>>>>>>>>>>>>>> hard
>> > >>>
>> > >>>>>>>>>>>>>>>>>>> to find it in URL autocomplete.
>> > >>>
>> > >>>>>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>>> --
>> > >>>
>> > >>>>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>
>> > >>>
>> > >>>>>>>>>>
>> > >>>
>> > >>>>>>>
>> > >>>
>> > >>>>>>
>> > >>>
>> > >>>>
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>> Adam Hooper
>> > >>> +1-514-882-9694
>> > >>> http://adamhooper.com
>> > >>>
>> >

Re: Long title on github page

Posted by Wes McKinney <we...@gmail.com>.
I'll wait a day or two for more feedback to percolate and then ask
Infra to change the description on GitHub.

On Thu, Jun 10, 2021 at 4:47 PM Adam Lippai <ad...@rigo.sk> wrote:
>
> +1
>
> On Thu, Jun 10, 2021, 23:38 Antoine Pitrou <an...@python.org> wrote:
>
> >
> > Sound good enough to me.
> >
> >
> > Le 10/06/2021 à 23:35, Wes McKinney a écrit :
> > > I hate to reopen this can of worms again, but here is my effort to
> > > synthesize feedback:
> > >
> > > "Apache Arrow is a multi-language toolbox for accelerated data
> > > interchange and in-memory processing."
> > >
> > > On Thu, Jun 10, 2021 at 12:37 PM Dominik Moritz <do...@apache.org>
> > wrote:
> > >>
> > >> I thought there were some good suggestions in this thread. @Wes, did you
> > >> find a description you liked?
> > >>
> > >> On May 18, 2021 at 06:24:47, Adam Hooper <ad...@adamhooper.com> wrote:
> > >>
> > >>> Poll question: why did you choose Arrow?
> > >>>
> > >>> Personally: I researched Arrow because it's a spec for IPC. (My
> > requirement
> > >>> was: "wrap computations in a separate process.") I chose Arrow for its
> > >>> community and ecosystem -- in other words, because my peers chose it.
> > >>>
> > >>> I happen to use the compute kernel and Parquet capabilities every day;
> > but
> > >>> they did not sway me at all. I would choose Arrow if it were nothing
> > but
> > >>> this spec and this community. (I chose HTML, after all.)
> > >>>
> > >>> I see the *code* as one enormous proof that the *spec* is good, and as
> > a
> > >>> collection of examples and best practices.
> > >>>
> > >>> ... so a great pitch to me would be: "Apache Arrow is a data format and
> > >>> toolbox for efficient in-memory processing."
> > >>>
> > >>> Enjoy life,
> > >>> Adam
> > >>>
> > >>> On Tue, May 18, 2021 at 2:38 AM Aldrin <ak...@ucsc.edu.invalid>
> > wrote:
> > >>>
> > >>> "Apache Arrow is a data processing library that also provides a
> > uniform,
> > >>>
> > >>> efficient interface for data systems."
> > >>>
> > >>>
> > >>> This probably still isn't quite right, I imagine the bit about "for
> > data
> > >>>
> > >>> systems" needs some addition (maybe "for transport between data
> > systems")?
> > >>>
> > >>>
> > >>> My primary motivators:
> > >>>
> > >>>
> > >>>     - "A data processing library":
> > >>>
> > >>>        - Arrow provides many language bindings, but ultimately they're
> > all
> > >>>
> > >>>        part of the same "library ecosystem", which I think is fine to
> > >>>
> > >>> capture in
> > >>>
> > >>>        "library"
> > >>>
> > >>>        - A main goal of arrow is for processing to be fast, whatever
> > that
> > >>>
> > >>>        processing may be
> > >>>
> > >>>        - "uniform, efficient interface for data systems":
> > >>>
> > >>>        - Arrow, provides (or tries to) a cohesive ("uniform")
> > interface for
> > >>>
> > >>>        data processing (although it has several APIs to do this)
> > >>>
> > >>>        - Also, IMO, a motivation for arrow was a format and library to
> > >>>
> > >>>        facilitate processing, but that provided functions and
> > >>>
> > >>> interfaces to easily
> > >>>
> > >>>        translate into optimized data formats used by disparate data
> > systems
> > >>>
> > >>>        (cassandra, hadoop, etc.).
> > >>>
> > >>>        - Arrow tries to be transparently zero-copy, which is part of
> > the
> > >>>
> > >>>        interface for efficiency
> > >>>
> > >>>     - Arrow certainly has a data format, but that format is the crux
> > of the
> > >>>
> > >>>     interface (IMO). However, it also makes using other formats easy
> > (via
> > >>>
> > >>>     filesystem API and parquet reader/writers, etc.). So, focusing on
> > the
> > >>>
> > >>> data
> > >>>
> > >>>     format seems unnecessary in such a terse description.
> > >>>
> > >>>
> > >>>
> > >>> Aldrin Montana
> > >>>
> > >>> Computer Science PhD Student
> > >>>
> > >>> UC Santa Cruz
> > >>>
> > >>>
> > >>>
> > >>> On Mon, May 17, 2021 at 5:07 PM Weston Pace <we...@gmail.com>
> > wrote:
> > >>>
> > >>>
> > >>>> I'd avoid the word "structured" as it is somewhat ill-defined.
> > >>>
> > >>>>
> > >>>
> > >>>> On Mon, May 17, 2021 at 12:37 PM Mauricio Vargas
> > >>>
> > >>>> <ma...@ursacomputing.com> wrote:
> > >>>
> > >>>>>
> > >>>
> > >>>>> more marketed:
> > >>>
> > >>>>> How about: "Apache Arrow is a format and language-agnostic library
> > >>>
> > >>>> focused
> > >>>
> > >>>>> on efficient sharing and processing of structured data."
> > >>>
> > >>>>>
> > >>>
> > >>>>> On Mon, May 17, 2021 at 6:25 PM Micah Kornfield <
> > emkornfield@gmail.com
> > >>>
> > >>>>
> > >>>
> > >>>>> wrote:
> > >>>
> > >>>>>
> > >>>
> > >>>>>> How about: "Apache Arrow is a collection of specifications, cross
> > >>>
> > >>>> language
> > >>>
> > >>>>>> libraries and applications focused on efficient sharing and
> > >>>
> > >>> processing
> > >>>
> > >>>> of
> > >>>
> > >>>>>> structured data."
> > >>>
> > >>>>>>
> > >>>
> > >>>>>> On Mon, May 17, 2021 at 3:06 PM Wes McKinney <we...@gmail.com>
> > >>>
> > >>>> wrote:
> > >>>
> > >>>>>>
> > >>>
> > >>>>>>> On Mon, May 17, 2021 at 4:58 PM Weston Pace <weston.pace@gmail.com
> > >>>
> > >>>>
> > >>>
> > >>>>>> wrote:
> > >>>
> > >>>>>>>>
> > >>>
> > >>>>>>>>> “Apache Arrow is a format and compute kernel for in-memory
> > >>>
> > >>> data”
> > >>>
> > >>>>>>>>
> > >>>
> > >>>>>>>> I like this but no one ever knows what "in-memory" means (or they
> > >>>
> > >>>> just
> > >>>
> > >>>>>>>> think 'data is always in memory').  How about...
> > >>>
> > >>>>>>>>
> > >>>
> > >>>>>>>> "Apache Arrow is a format and compute kernel for zero-copy
> > >>>
> > >>>> processing
> > >>>
> > >>>>>>>> and sharing of data."
> > >>>
> > >>>>>>>>
> > >>>
> > >>>>>>>> or...
> > >>>
> > >>>>>>>>
> > >>>
> > >>>>>>>> "Apache Arrow is a format and compute kernel for processing and
> > >>>
> > >>>>>>>> sharing data without serialization overhead."
> > >>>
> > >>>>>>>
> > >>>
> > >>>>>>> A few issues with this:
> > >>>
> > >>>>>>>
> > >>>
> > >>>>>>> * Multiple PL aspect unclear (is a single piece of software, or
> > >>>
> > >>>>>>> multiple pieces of software?)
> > >>>
> > >>>>>>> * Development platform aspect unclear
> > >>>
> > >>>>>>>
> > >>>
> > >>>>>>> I see that some people don't like the word "platform". Some people
> > >>>
> > >>>>>>> come to this project and want to find an end-to-end application,
> > >>>
> > >>>>>>> rather than a developer toolkit that they can use to build
> > >>>
> > >>>>>>> applications. Perhaps we should be more explicit and use
> > >>>
> > >>>>>>> "computational development toolkit" instead of "platform".
> > >>>
> > >>>>>>>
> > >>>
> > >>>>>>>> Although marshalling[1] would probably be a more precise word it
> > >>>
> > >>> is
> > >>>
> > >>>>>>>> not as well known.
> > >>>
> > >>>>>>>>
> > >>>
> > >>>>>>>> [1] https://en.wikipedia.org/wiki/Marshalling_(computer_science)
> > >>>
> > >>>>>>>>
> > >>>
> > >>>>>>>> On Mon, May 17, 2021 at 9:36 AM Mauricio Vargas
> > >>>
> > >>>>>>>> <ma...@ursacomputing.com> wrote:
> > >>>
> > >>>>>>>>>
> > >>>
> > >>>>>>>>> a few ideas
> > >>>
> > >>>>>>>>>
> > >>>
> > >>>>>>>>> github.com/apache/arrow - Apache Arrow is an efficient library
> > >>>
> > >>>> for
> > >>>
> > >>>>>>> big data
> > >>>
> > >>>>>>>>> processing and sharing
> > >>>
> > >>>>>>>>>
> > >>>
> > >>>>>>>>> github.com/apache/arrow - Apache Arrow is a computational tool
> > >>>
> > >>>> for
> > >>>
> > >>>>>>>>> processing, storing and sharing large datasets
> > >>>
> > >>>>>>>>>
> > >>>
> > >>>>>>>>> github.com/apache/arrow - Apache Arrow is a  fast and simple
> > >>>
> > >>>> library
> > >>>
> > >>>>>>> for
> > >>>
> > >>>>>>>>> big data analytics
> > >>>
> > >>>>>>>>>
> > >>>
> > >>>>>>>>> *github.com/apache/arrow <http://github.com/apache/arrow> -
> > >>>
> > >>>> Apache
> > >>>
> > >>>>>>> Arrow is
> > >>>
> > >>>>>>>>> a powerful workhorse for analytic operations on modern
> > >>>
> > >>> hardware*
> > >>>
> > >>>>>>>>>
> > >>>
> > >>>>>>>>>
> > >>>
> > >>>>>>>>> On Mon, May 17, 2021 at 3:13 PM Julian Hyde <
> > >>>
> > >>>> jhyde.apache@gmail.com>
> > >>>
> > >>>>>>> wrote:
> > >>>
> > >>>>>>>>>
> > >>>
> > >>>>>>>>>> Alright, well, whatever it is, it must fit into one breath.
> > >>>
> > >>> If
> > >>>
> > >>>> the
> > >>>
> > >>>>>>>>>> high-concept pitch is successful, people will stick around
> > >>>
> > >>> for
> > >>>
> > >>>> the
> > >>>
> > >>>>>>> full
> > >>>
> > >>>>>>>>>> pitch.
> > >>>
> > >>>>>>>>>>
> > >>>
> > >>>>>>>>>> Words such as “platform” and “enable” are noise. You say
> > >>>
> > >>>>>> “platform”,
> > >>>
> > >>>>>>> they
> > >>>
> > >>>>>>>>>> start to say “what exactly do you mean by platform”, the
> > >>>
> > >>>> elevator
> > >>>
> > >>>>>>> doors
> > >>>
> > >>>>>>>>>> open, and they’re gone.
> > >>>
> > >>>>>>>>>>
> > >>>
> > >>>>>>>>>> “Apache Arrow is a format and compute kernel for in-memory
> > >>>
> > >>>> data”
> > >>>
> > >>>>>>>>>>
> > >>>
> > >>>>>>>>>>
> > >>>
> > >>>>>>>>>>> On May 17, 2021, at 12:03 PM, Eduardo Ponce <
> > >>>
> > >>>> edponce00@gmail.com
> > >>>
> > >>>>>>>
> > >>>
> > >>>>>>> wrote:
> > >>>
> > >>>>>>>>>>>
> > >>>
> > >>>>>>>>>>> One more suggestion for the bucket:
> > >>>
> > >>>>>>>>>>> "Apache Arrow is a computational platform for efficient
> > >>>
> > >>>> in-memory
> > >>>
> > >>>>>>> data
> > >>>
> > >>>>>>>>>>> representation and processing."
> > >>>
> > >>>>>>>>>>>
> > >>>
> > >>>>>>>>>>> On Mon, May 17, 2021 at 2:49 PM Wes McKinney <
> > >>>
> > >>>>>> wesmckinn@gmail.com>
> > >>>
> > >>>>>>>>>> wrote:
> > >>>
> > >>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>> I think less is better in the description, but
> > >>>
> > >>>> unfortunately the
> > >>>
> > >>>>>>>>>>>> association of Arrow as being "just a data format" has
> > >>>
> > >>> been
> > >>>
> > >>>>>>> actively
> > >>>
> > >>>>>>>>>>>> harmful in some ways to community growth. We have a data
> > >>>
> > >>>> format,
> > >>>
> > >>>>>>> yes,
> > >>>
> > >>>>>>>>>>>> but we are also creating a computational platform to go
> > >>>
> > >>>>>>> hand-in-hand
> > >>>
> > >>>>>>>>>>>> with the data format to make it easier to build fast
> > >>>
> > >>>>>> applications
> > >>>
> > >>>>>>> that
> > >>>
> > >>>>>>>>>>>> use the data format. So the description needs to capture
> > >>>
> > >>>> both of
> > >>>
> > >>>>>>> these
> > >>>
> > >>>>>>>>>>>> ideas.
> > >>>
> > >>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>> On Mon, May 17, 2021 at 12:15 PM Julian Hyde <
> > >>>
> > >>>>>>> jhyde.apache@gmail.com>
> > >>>
> > >>>>>>>>>>>> wrote:
> > >>>
> > >>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>> I think that the “cross-language development platform
> > >>>
> > >>> for”
> > >>>
> > >>>> is
> > >>>
> > >>>>>>> noise.
> > >>>
> > >>>>>>>>>>>> (I’m sure that JPEG developers think that JPEG is a
> > >>>
> > >>>>>>> “cross-language
> > >>>
> > >>>>>>>>>>>> development platform” too. But it isn’t. It is an image
> > >>>
> > >>>> format.)
> > >>>
> > >>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>> "Apache Arrow is data format for efficient in-memory
> > >>>
> > >>>>>> processing.”
> > >>>
> > >>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>> I’ll note that In marketing speak, we are developing a
> > >>>
> > >>>>>>> high-concept
> > >>>
> > >>>>>>>>>>>> pitch [1] here. Every company needs a name, a brand, a
> > >>>
> > >>>>>>> high-concept
> > >>>
> > >>>>>>>>>> pitch,
> > >>>
> > >>>>>>>>>>>> and 3- or 4-sentence description. But every Apache project
> > >>>
> > >>>> needs
> > >>>
> > >>>>>>> these
> > >>>
> > >>>>>>>>>> too.
> > >>>
> > >>>>>>>>>>>> It’s worth spending the time on the description, also, and
> > >>>
> > >>>> then
> > >>>
> > >>>>>>> use
> > >>>
> > >>>>>>>>>> them in
> > >>>
> > >>>>>>>>>>>> all the places that we describe Arrow.
> > >>>
> > >>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>> Julian
> > >>>
> > >>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>> [1]
> > >>>
> > >>>>>>> https://www.growthink.com/content/whats-your-high-concept-pitch
> > >>>
> > >>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>> On May 17, 2021, at 7:38 AM, Eduardo Ponce <
> > >>>
> > >>>>>> edponce00@gmail.com
> > >>>
> > >>>>>>>>
> > >>>
> > >>>>>>>>>>>> wrote:
> > >>>
> > >>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>> I agree with Nate's and Brian's suggestions, but would
> > >>>
> > >>>> like to
> > >>>
> > >>>>>>> add
> > >>>
> > >>>>>>>>>>>> that we
> > >>>
> > >>>>>>>>>>>>>> can make it a one-liner for more conciseness and
> > >>>
> > >>>> consistency
> > >>>
> > >>>>>>> with
> > >>>
> > >>>>>>>>>> other
> > >>>
> > >>>>>>>>>>>>>> Apache projects.
> > >>>
> > >>>>>>>>>>>>>> Apologies if it seems I am going around the suggestions
> > >>>
> > >>>> loop
> > >>>
> > >>>>>>> again.
> > >>>
> > >>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>> "Apache Arrow is a cross-language development platform
> > >>>
> > >>>>>> enabling
> > >>>
> > >>>>>>>>>>>> efficient
> > >>>
> > >>>>>>>>>>>>>> in-memory data processing and transport."
> > >>>
> > >>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>> On Mon, May 17, 2021 at 10:11 AM Brian Hulette <
> > >>>
> > >>>>>>> bhulette@apache.org>
> > >>>
> > >>>>>>>>>>>> wrote:
> > >>>
> > >>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>> Thank you for bringing this up Dominik. I sampled some
> > >>>
> > >>>> of the
> > >>>
> > >>>>>>>>>>>> descriptions
> > >>>
> > >>>>>>>>>>>>>>> for other Apache projects I frequent, the ones with a
> > >>>
> > >>>>>>> meaningful
> > >>>
> > >>>>>>>>>>>>>>> description have a single sentence:
> > >>>
> > >>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>> github.com/apache/spark - Apache Spark - A unified
> > >>>
> > >>>> analytics
> > >>>
> > >>>>>>> engine
> > >>>
> > >>>>>>>>>>>> for
> > >>>
> > >>>>>>>>>>>>>>> large-scale data processing
> > >>>
> > >>>>>>>>>>>>>>> github.com/apache/beam - Apache Beam is a unified
> > >>>
> > >>>>>> programming
> > >>>
> > >>>>>>> model
> > >>>
> > >>>>>>>>>>>> for
> > >>>
> > >>>>>>>>>>>>>>> Batch and Streaming
> > >>>
> > >>>>>>>>>>>>>>> github.com/apache/avro - Apache Avro is a data
> > >>>
> > >>>> serialization
> > >>>
> > >>>>>>> system
> > >>>
> > >>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>> Several others (Flink, Hadoop, ...) just have  "[Mirror
> > >>>
> > >>>> of]
> > >>>
> > >>>>>>> Apache
> > >>>
> > >>>>>>>>>>>> <name>"
> > >>>
> > >>>>>>>>>>>>>>> as the description.
> > >>>
> > >>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>> +1 for Nate's suggestion "Apache Arrow is a
> > >>>
> > >>>> cross-language
> > >>>
> > >>>>>>>>>> development
> > >>>
> > >>>>>>>>>>>>>>> platform for in-memory data. It enables systems to
> > >>>
> > >>>> process
> > >>>
> > >>>>>> and
> > >>>
> > >>>>>>>>>>>> transport
> > >>>
> > >>>>>>>>>>>>>>> data more efficiently."
> > >>>
> > >>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>> On Mon, May 17, 2021 at 5:23 AM Wes McKinney <
> > >>>
> > >>>>>>> wesmckinn@gmail.com>
> > >>>
> > >>>>>>>>>>>> wrote:
> > >>>
> > >>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>> It's probably best for description to limit mentions
> > >>>
> > >>> of
> > >>>
> > >>>>>>> specific
> > >>>
> > >>>>>>>>>>>>>>>> features. There are some high level features mentioned
> > >>>
> > >>>> in
> > >>>
> > >>>>>> the
> > >>>
> > >>>>>>>>>>>>>>>> description now ("computational libraries and
> > >>>
> > >>> zero-copy
> > >>>
> > >>>>>>> streaming
> > >>>
> > >>>>>>>>>>>>>>>> messaging and interprocess communication"), but now in
> > >>>
> > >>>> 2021
> > >>>
> > >>>>>>> since
> > >>>
> > >>>>>>>>>> the
> > >>>
> > >>>>>>>>>>>>>>>> project has grown so much, it could leave people with
> > >>>
> > >>> a
> > >>>
> > >>>>>>> limited view
> > >>>
> > >>>>>>>>>>>>>>>> of what they might find here.
> > >>>
> > >>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>> On Mon, May 17, 2021 at 12:14 AM Mauricio Vargas
> > >>>
> > >>>>>>>>>>>>>>>> <ma...@ursacomputing.com> wrote:
> > >>>
> > >>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>> How about
> > >>>
> > >>>>>>>>>>>>>>>>> 'Apache Arrow is a cross-language development
> > >>>
> > >>> platform
> > >>>
> > >>>> for
> > >>>
> > >>>>>>>>>> in-memory
> > >>>
> > >>>>>>>>>>>>>>>> data.
> > >>>
> > >>>>>>>>>>>>>>>>> It enables systems to process and transport data
> > >>>
> > >>>>>> efficiently,
> > >>>
> > >>>>>>>>>>>>>>> providing a
> > >>>
> > >>>>>>>>>>>>>>>>> simple and fast library for partitioning of large
> > >>>
> > >>>> tables'?
> > >>>
> > >>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>> Sorry the delay, long election day
> > >>>
> > >>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>> On Sun, May 16, 2021, 2:27 PM Nate Bauernfeind <
> > >>>
> > >>>>>>>>>>>>>>>> natebauernfeind@deephaven.io>
> > >>>
> > >>>>>>>>>>>>>>>>> wrote:
> > >>>
> > >>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>>> Suggestion: faster -> more efficiently
> > >>>
> > >>>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>>> "Apache Arrow is a cross-language development
> > >>>
> > >>>> platform for
> > >>>
> > >>>>>>>>>>>> in-memory
> > >>>
> > >>>>>>>>>>>>>>>>>> data. It enables systems to process and transport
> > >>>
> > >>> data
> > >>>
> > >>>>>> more
> > >>>
> > >>>>>>>>>>>>>>>> efficiently."
> > >>>
> > >>>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>>> On Sun, May 16, 2021 at 11:35 AM Wes McKinney <
> > >>>
> > >>>>>>>>>> wesmckinn@gmail.com
> > >>>
> > >>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>> wrote:
> > >>>
> > >>>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>>>> Here's what there now:
> > >>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>>>> "Apache Arrow is a cross-language development
> > >>>
> > >>>> platform
> > >>>
> > >>>>>> for
> > >>>
> > >>>>>>>>>>>>>>> in-memory
> > >>>
> > >>>>>>>>>>>>>>>>>>> data. It specifies a standardized
> > >>>
> > >>>> language-independent
> > >>>
> > >>>>>>> columnar
> > >>>
> > >>>>>>>>>>>>>>>> memory
> > >>>
> > >>>>>>>>>>>>>>>>>>> format for flat and hierarchical data, organized
> > >>>
> > >>> for
> > >>>
> > >>>>>>> efficient
> > >>>
> > >>>>>>>>>>>>>>>>>>> analytic operations on modern hardware. It also
> > >>>
> > >>>> provides
> > >>>
> > >>>>>>>>>>>>>>>> computational
> > >>>
> > >>>>>>>>>>>>>>>>>>> libraries and zero-copy streaming messaging and
> > >>>
> > >>>>>>> interprocess
> > >>>
> > >>>>>>>>>>>>>>>>>>> communication…"
> > >>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>>>> How about something shorter like
> > >>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>>>> "Apache Arrow is a cross-language development
> > >>>
> > >>>> platform
> > >>>
> > >>>>>> for
> > >>>
> > >>>>>>>>>>>>>>> in-memory
> > >>>
> > >>>>>>>>>>>>>>>>>>> data. It enables systems to process and transport
> > >>>
> > >>>> data
> > >>>
> > >>>>>>> faster."
> > >>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>>>> Suggestions / refinements from others welcome
> > >>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>>>> On Sat, May 15, 2021 at 9:12 PM Dominik Moritz <
> > >>>
> > >>>>>>> domoritz@cmu.edu
> > >>>
> > >>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>> wrote:
> > >>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>>>>> Super minor issue but could someone make the
> > >>>
> > >>>> description
> > >>>
> > >>>>>>> on
> > >>>
> > >>>>>>>>>>>>>>> GitHub
> > >>>
> > >>>>>>>>>>>>>>>>>>> shorter?
> > >>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>>>>> GitHub puts the description into the title of the
> > >>>
> > >>>> page
> > >>>
> > >>>>>>> and makes
> > >>>
> > >>>>>>>>>>>>>>> it
> > >>>
> > >>>>>>>>>>>>>>>>>> hard
> > >>>
> > >>>>>>>>>>>>>>>>>>> to find it in URL autocomplete.
> > >>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>>> --
> > >>>
> > >>>>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>>>
> > >>>
> > >>>>>>>>>>
> > >>>
> > >>>>>>>>>>
> > >>>
> > >>>>>>>
> > >>>
> > >>>>>>
> > >>>
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Adam Hooper
> > >>> +1-514-882-9694
> > >>> http://adamhooper.com
> > >>>
> >

Re: Long title on github page

Posted by Adam Lippai <ad...@rigo.sk>.
+1

On Thu, Jun 10, 2021, 23:38 Antoine Pitrou <an...@python.org> wrote:

>
> Sound good enough to me.
>
>
> Le 10/06/2021 à 23:35, Wes McKinney a écrit :
> > I hate to reopen this can of worms again, but here is my effort to
> > synthesize feedback:
> >
> > "Apache Arrow is a multi-language toolbox for accelerated data
> > interchange and in-memory processing."
> >
> > On Thu, Jun 10, 2021 at 12:37 PM Dominik Moritz <do...@apache.org>
> wrote:
> >>
> >> I thought there were some good suggestions in this thread. @Wes, did you
> >> find a description you liked?
> >>
> >> On May 18, 2021 at 06:24:47, Adam Hooper <ad...@adamhooper.com> wrote:
> >>
> >>> Poll question: why did you choose Arrow?
> >>>
> >>> Personally: I researched Arrow because it's a spec for IPC. (My
> requirement
> >>> was: "wrap computations in a separate process.") I chose Arrow for its
> >>> community and ecosystem -- in other words, because my peers chose it.
> >>>
> >>> I happen to use the compute kernel and Parquet capabilities every day;
> but
> >>> they did not sway me at all. I would choose Arrow if it were nothing
> but
> >>> this spec and this community. (I chose HTML, after all.)
> >>>
> >>> I see the *code* as one enormous proof that the *spec* is good, and as
> a
> >>> collection of examples and best practices.
> >>>
> >>> ... so a great pitch to me would be: "Apache Arrow is a data format and
> >>> toolbox for efficient in-memory processing."
> >>>
> >>> Enjoy life,
> >>> Adam
> >>>
> >>> On Tue, May 18, 2021 at 2:38 AM Aldrin <ak...@ucsc.edu.invalid>
> wrote:
> >>>
> >>> "Apache Arrow is a data processing library that also provides a
> uniform,
> >>>
> >>> efficient interface for data systems."
> >>>
> >>>
> >>> This probably still isn't quite right, I imagine the bit about "for
> data
> >>>
> >>> systems" needs some addition (maybe "for transport between data
> systems")?
> >>>
> >>>
> >>> My primary motivators:
> >>>
> >>>
> >>>     - "A data processing library":
> >>>
> >>>        - Arrow provides many language bindings, but ultimately they're
> all
> >>>
> >>>        part of the same "library ecosystem", which I think is fine to
> >>>
> >>> capture in
> >>>
> >>>        "library"
> >>>
> >>>        - A main goal of arrow is for processing to be fast, whatever
> that
> >>>
> >>>        processing may be
> >>>
> >>>        - "uniform, efficient interface for data systems":
> >>>
> >>>        - Arrow, provides (or tries to) a cohesive ("uniform")
> interface for
> >>>
> >>>        data processing (although it has several APIs to do this)
> >>>
> >>>        - Also, IMO, a motivation for arrow was a format and library to
> >>>
> >>>        facilitate processing, but that provided functions and
> >>>
> >>> interfaces to easily
> >>>
> >>>        translate into optimized data formats used by disparate data
> systems
> >>>
> >>>        (cassandra, hadoop, etc.).
> >>>
> >>>        - Arrow tries to be transparently zero-copy, which is part of
> the
> >>>
> >>>        interface for efficiency
> >>>
> >>>     - Arrow certainly has a data format, but that format is the crux
> of the
> >>>
> >>>     interface (IMO). However, it also makes using other formats easy
> (via
> >>>
> >>>     filesystem API and parquet reader/writers, etc.). So, focusing on
> the
> >>>
> >>> data
> >>>
> >>>     format seems unnecessary in such a terse description.
> >>>
> >>>
> >>>
> >>> Aldrin Montana
> >>>
> >>> Computer Science PhD Student
> >>>
> >>> UC Santa Cruz
> >>>
> >>>
> >>>
> >>> On Mon, May 17, 2021 at 5:07 PM Weston Pace <we...@gmail.com>
> wrote:
> >>>
> >>>
> >>>> I'd avoid the word "structured" as it is somewhat ill-defined.
> >>>
> >>>>
> >>>
> >>>> On Mon, May 17, 2021 at 12:37 PM Mauricio Vargas
> >>>
> >>>> <ma...@ursacomputing.com> wrote:
> >>>
> >>>>>
> >>>
> >>>>> more marketed:
> >>>
> >>>>> How about: "Apache Arrow is a format and language-agnostic library
> >>>
> >>>> focused
> >>>
> >>>>> on efficient sharing and processing of structured data."
> >>>
> >>>>>
> >>>
> >>>>> On Mon, May 17, 2021 at 6:25 PM Micah Kornfield <
> emkornfield@gmail.com
> >>>
> >>>>
> >>>
> >>>>> wrote:
> >>>
> >>>>>
> >>>
> >>>>>> How about: "Apache Arrow is a collection of specifications, cross
> >>>
> >>>> language
> >>>
> >>>>>> libraries and applications focused on efficient sharing and
> >>>
> >>> processing
> >>>
> >>>> of
> >>>
> >>>>>> structured data."
> >>>
> >>>>>>
> >>>
> >>>>>> On Mon, May 17, 2021 at 3:06 PM Wes McKinney <we...@gmail.com>
> >>>
> >>>> wrote:
> >>>
> >>>>>>
> >>>
> >>>>>>> On Mon, May 17, 2021 at 4:58 PM Weston Pace <weston.pace@gmail.com
> >>>
> >>>>
> >>>
> >>>>>> wrote:
> >>>
> >>>>>>>>
> >>>
> >>>>>>>>> “Apache Arrow is a format and compute kernel for in-memory
> >>>
> >>> data”
> >>>
> >>>>>>>>
> >>>
> >>>>>>>> I like this but no one ever knows what "in-memory" means (or they
> >>>
> >>>> just
> >>>
> >>>>>>>> think 'data is always in memory').  How about...
> >>>
> >>>>>>>>
> >>>
> >>>>>>>> "Apache Arrow is a format and compute kernel for zero-copy
> >>>
> >>>> processing
> >>>
> >>>>>>>> and sharing of data."
> >>>
> >>>>>>>>
> >>>
> >>>>>>>> or...
> >>>
> >>>>>>>>
> >>>
> >>>>>>>> "Apache Arrow is a format and compute kernel for processing and
> >>>
> >>>>>>>> sharing data without serialization overhead."
> >>>
> >>>>>>>
> >>>
> >>>>>>> A few issues with this:
> >>>
> >>>>>>>
> >>>
> >>>>>>> * Multiple PL aspect unclear (is a single piece of software, or
> >>>
> >>>>>>> multiple pieces of software?)
> >>>
> >>>>>>> * Development platform aspect unclear
> >>>
> >>>>>>>
> >>>
> >>>>>>> I see that some people don't like the word "platform". Some people
> >>>
> >>>>>>> come to this project and want to find an end-to-end application,
> >>>
> >>>>>>> rather than a developer toolkit that they can use to build
> >>>
> >>>>>>> applications. Perhaps we should be more explicit and use
> >>>
> >>>>>>> "computational development toolkit" instead of "platform".
> >>>
> >>>>>>>
> >>>
> >>>>>>>> Although marshalling[1] would probably be a more precise word it
> >>>
> >>> is
> >>>
> >>>>>>>> not as well known.
> >>>
> >>>>>>>>
> >>>
> >>>>>>>> [1] https://en.wikipedia.org/wiki/Marshalling_(computer_science)
> >>>
> >>>>>>>>
> >>>
> >>>>>>>> On Mon, May 17, 2021 at 9:36 AM Mauricio Vargas
> >>>
> >>>>>>>> <ma...@ursacomputing.com> wrote:
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> a few ideas
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> github.com/apache/arrow - Apache Arrow is an efficient library
> >>>
> >>>> for
> >>>
> >>>>>>> big data
> >>>
> >>>>>>>>> processing and sharing
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> github.com/apache/arrow - Apache Arrow is a computational tool
> >>>
> >>>> for
> >>>
> >>>>>>>>> processing, storing and sharing large datasets
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> github.com/apache/arrow - Apache Arrow is a  fast and simple
> >>>
> >>>> library
> >>>
> >>>>>>> for
> >>>
> >>>>>>>>> big data analytics
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> *github.com/apache/arrow <http://github.com/apache/arrow> -
> >>>
> >>>> Apache
> >>>
> >>>>>>> Arrow is
> >>>
> >>>>>>>>> a powerful workhorse for analytic operations on modern
> >>>
> >>> hardware*
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>> On Mon, May 17, 2021 at 3:13 PM Julian Hyde <
> >>>
> >>>> jhyde.apache@gmail.com>
> >>>
> >>>>>>> wrote:
> >>>
> >>>>>>>>>
> >>>
> >>>>>>>>>> Alright, well, whatever it is, it must fit into one breath.
> >>>
> >>> If
> >>>
> >>>> the
> >>>
> >>>>>>>>>> high-concept pitch is successful, people will stick around
> >>>
> >>> for
> >>>
> >>>> the
> >>>
> >>>>>>> full
> >>>
> >>>>>>>>>> pitch.
> >>>
> >>>>>>>>>>
> >>>
> >>>>>>>>>> Words such as “platform” and “enable” are noise. You say
> >>>
> >>>>>> “platform”,
> >>>
> >>>>>>> they
> >>>
> >>>>>>>>>> start to say “what exactly do you mean by platform”, the
> >>>
> >>>> elevator
> >>>
> >>>>>>> doors
> >>>
> >>>>>>>>>> open, and they’re gone.
> >>>
> >>>>>>>>>>
> >>>
> >>>>>>>>>> “Apache Arrow is a format and compute kernel for in-memory
> >>>
> >>>> data”
> >>>
> >>>>>>>>>>
> >>>
> >>>>>>>>>>
> >>>
> >>>>>>>>>>> On May 17, 2021, at 12:03 PM, Eduardo Ponce <
> >>>
> >>>> edponce00@gmail.com
> >>>
> >>>>>>>
> >>>
> >>>>>>> wrote:
> >>>
> >>>>>>>>>>>
> >>>
> >>>>>>>>>>> One more suggestion for the bucket:
> >>>
> >>>>>>>>>>> "Apache Arrow is a computational platform for efficient
> >>>
> >>>> in-memory
> >>>
> >>>>>>> data
> >>>
> >>>>>>>>>>> representation and processing."
> >>>
> >>>>>>>>>>>
> >>>
> >>>>>>>>>>> On Mon, May 17, 2021 at 2:49 PM Wes McKinney <
> >>>
> >>>>>> wesmckinn@gmail.com>
> >>>
> >>>>>>>>>> wrote:
> >>>
> >>>>>>>>>>>
> >>>
> >>>>>>>>>>>> I think less is better in the description, but
> >>>
> >>>> unfortunately the
> >>>
> >>>>>>>>>>>> association of Arrow as being "just a data format" has
> >>>
> >>> been
> >>>
> >>>>>>> actively
> >>>
> >>>>>>>>>>>> harmful in some ways to community growth. We have a data
> >>>
> >>>> format,
> >>>
> >>>>>>> yes,
> >>>
> >>>>>>>>>>>> but we are also creating a computational platform to go
> >>>
> >>>>>>> hand-in-hand
> >>>
> >>>>>>>>>>>> with the data format to make it easier to build fast
> >>>
> >>>>>> applications
> >>>
> >>>>>>> that
> >>>
> >>>>>>>>>>>> use the data format. So the description needs to capture
> >>>
> >>>> both of
> >>>
> >>>>>>> these
> >>>
> >>>>>>>>>>>> ideas.
> >>>
> >>>>>>>>>>>>
> >>>
> >>>>>>>>>>>> On Mon, May 17, 2021 at 12:15 PM Julian Hyde <
> >>>
> >>>>>>> jhyde.apache@gmail.com>
> >>>
> >>>>>>>>>>>> wrote:
> >>>
> >>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>> I think that the “cross-language development platform
> >>>
> >>> for”
> >>>
> >>>> is
> >>>
> >>>>>>> noise.
> >>>
> >>>>>>>>>>>> (I’m sure that JPEG developers think that JPEG is a
> >>>
> >>>>>>> “cross-language
> >>>
> >>>>>>>>>>>> development platform” too. But it isn’t. It is an image
> >>>
> >>>> format.)
> >>>
> >>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>> "Apache Arrow is data format for efficient in-memory
> >>>
> >>>>>> processing.”
> >>>
> >>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>> I’ll note that In marketing speak, we are developing a
> >>>
> >>>>>>> high-concept
> >>>
> >>>>>>>>>>>> pitch [1] here. Every company needs a name, a brand, a
> >>>
> >>>>>>> high-concept
> >>>
> >>>>>>>>>> pitch,
> >>>
> >>>>>>>>>>>> and 3- or 4-sentence description. But every Apache project
> >>>
> >>>> needs
> >>>
> >>>>>>> these
> >>>
> >>>>>>>>>> too.
> >>>
> >>>>>>>>>>>> It’s worth spending the time on the description, also, and
> >>>
> >>>> then
> >>>
> >>>>>>> use
> >>>
> >>>>>>>>>> them in
> >>>
> >>>>>>>>>>>> all the places that we describe Arrow.
> >>>
> >>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>> Julian
> >>>
> >>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>> [1]
> >>>
> >>>>>>> https://www.growthink.com/content/whats-your-high-concept-pitch
> >>>
> >>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>> On May 17, 2021, at 7:38 AM, Eduardo Ponce <
> >>>
> >>>>>> edponce00@gmail.com
> >>>
> >>>>>>>>
> >>>
> >>>>>>>>>>>> wrote:
> >>>
> >>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>> I agree with Nate's and Brian's suggestions, but would
> >>>
> >>>> like to
> >>>
> >>>>>>> add
> >>>
> >>>>>>>>>>>> that we
> >>>
> >>>>>>>>>>>>>> can make it a one-liner for more conciseness and
> >>>
> >>>> consistency
> >>>
> >>>>>>> with
> >>>
> >>>>>>>>>> other
> >>>
> >>>>>>>>>>>>>> Apache projects.
> >>>
> >>>>>>>>>>>>>> Apologies if it seems I am going around the suggestions
> >>>
> >>>> loop
> >>>
> >>>>>>> again.
> >>>
> >>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>> "Apache Arrow is a cross-language development platform
> >>>
> >>>>>> enabling
> >>>
> >>>>>>>>>>>> efficient
> >>>
> >>>>>>>>>>>>>> in-memory data processing and transport."
> >>>
> >>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>> On Mon, May 17, 2021 at 10:11 AM Brian Hulette <
> >>>
> >>>>>>> bhulette@apache.org>
> >>>
> >>>>>>>>>>>> wrote:
> >>>
> >>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>> Thank you for bringing this up Dominik. I sampled some
> >>>
> >>>> of the
> >>>
> >>>>>>>>>>>> descriptions
> >>>
> >>>>>>>>>>>>>>> for other Apache projects I frequent, the ones with a
> >>>
> >>>>>>> meaningful
> >>>
> >>>>>>>>>>>>>>> description have a single sentence:
> >>>
> >>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>> github.com/apache/spark - Apache Spark - A unified
> >>>
> >>>> analytics
> >>>
> >>>>>>> engine
> >>>
> >>>>>>>>>>>> for
> >>>
> >>>>>>>>>>>>>>> large-scale data processing
> >>>
> >>>>>>>>>>>>>>> github.com/apache/beam - Apache Beam is a unified
> >>>
> >>>>>> programming
> >>>
> >>>>>>> model
> >>>
> >>>>>>>>>>>> for
> >>>
> >>>>>>>>>>>>>>> Batch and Streaming
> >>>
> >>>>>>>>>>>>>>> github.com/apache/avro - Apache Avro is a data
> >>>
> >>>> serialization
> >>>
> >>>>>>> system
> >>>
> >>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>> Several others (Flink, Hadoop, ...) just have  "[Mirror
> >>>
> >>>> of]
> >>>
> >>>>>>> Apache
> >>>
> >>>>>>>>>>>> <name>"
> >>>
> >>>>>>>>>>>>>>> as the description.
> >>>
> >>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>> +1 for Nate's suggestion "Apache Arrow is a
> >>>
> >>>> cross-language
> >>>
> >>>>>>>>>> development
> >>>
> >>>>>>>>>>>>>>> platform for in-memory data. It enables systems to
> >>>
> >>>> process
> >>>
> >>>>>> and
> >>>
> >>>>>>>>>>>> transport
> >>>
> >>>>>>>>>>>>>>> data more efficiently."
> >>>
> >>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>> On Mon, May 17, 2021 at 5:23 AM Wes McKinney <
> >>>
> >>>>>>> wesmckinn@gmail.com>
> >>>
> >>>>>>>>>>>> wrote:
> >>>
> >>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>> It's probably best for description to limit mentions
> >>>
> >>> of
> >>>
> >>>>>>> specific
> >>>
> >>>>>>>>>>>>>>>> features. There are some high level features mentioned
> >>>
> >>>> in
> >>>
> >>>>>> the
> >>>
> >>>>>>>>>>>>>>>> description now ("computational libraries and
> >>>
> >>> zero-copy
> >>>
> >>>>>>> streaming
> >>>
> >>>>>>>>>>>>>>>> messaging and interprocess communication"), but now in
> >>>
> >>>> 2021
> >>>
> >>>>>>> since
> >>>
> >>>>>>>>>> the
> >>>
> >>>>>>>>>>>>>>>> project has grown so much, it could leave people with
> >>>
> >>> a
> >>>
> >>>>>>> limited view
> >>>
> >>>>>>>>>>>>>>>> of what they might find here.
> >>>
> >>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>> On Mon, May 17, 2021 at 12:14 AM Mauricio Vargas
> >>>
> >>>>>>>>>>>>>>>> <ma...@ursacomputing.com> wrote:
> >>>
> >>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>> How about
> >>>
> >>>>>>>>>>>>>>>>> 'Apache Arrow is a cross-language development
> >>>
> >>> platform
> >>>
> >>>> for
> >>>
> >>>>>>>>>> in-memory
> >>>
> >>>>>>>>>>>>>>>> data.
> >>>
> >>>>>>>>>>>>>>>>> It enables systems to process and transport data
> >>>
> >>>>>> efficiently,
> >>>
> >>>>>>>>>>>>>>> providing a
> >>>
> >>>>>>>>>>>>>>>>> simple and fast library for partitioning of large
> >>>
> >>>> tables'?
> >>>
> >>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>> Sorry the delay, long election day
> >>>
> >>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>> On Sun, May 16, 2021, 2:27 PM Nate Bauernfeind <
> >>>
> >>>>>>>>>>>>>>>> natebauernfeind@deephaven.io>
> >>>
> >>>>>>>>>>>>>>>>> wrote:
> >>>
> >>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>>> Suggestion: faster -> more efficiently
> >>>
> >>>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>>> "Apache Arrow is a cross-language development
> >>>
> >>>> platform for
> >>>
> >>>>>>>>>>>> in-memory
> >>>
> >>>>>>>>>>>>>>>>>> data. It enables systems to process and transport
> >>>
> >>> data
> >>>
> >>>>>> more
> >>>
> >>>>>>>>>>>>>>>> efficiently."
> >>>
> >>>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>>> On Sun, May 16, 2021 at 11:35 AM Wes McKinney <
> >>>
> >>>>>>>>>> wesmckinn@gmail.com
> >>>
> >>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>> wrote:
> >>>
> >>>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>>>> Here's what there now:
> >>>
> >>>>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>>>> "Apache Arrow is a cross-language development
> >>>
> >>>> platform
> >>>
> >>>>>> for
> >>>
> >>>>>>>>>>>>>>> in-memory
> >>>
> >>>>>>>>>>>>>>>>>>> data. It specifies a standardized
> >>>
> >>>> language-independent
> >>>
> >>>>>>> columnar
> >>>
> >>>>>>>>>>>>>>>> memory
> >>>
> >>>>>>>>>>>>>>>>>>> format for flat and hierarchical data, organized
> >>>
> >>> for
> >>>
> >>>>>>> efficient
> >>>
> >>>>>>>>>>>>>>>>>>> analytic operations on modern hardware. It also
> >>>
> >>>> provides
> >>>
> >>>>>>>>>>>>>>>> computational
> >>>
> >>>>>>>>>>>>>>>>>>> libraries and zero-copy streaming messaging and
> >>>
> >>>>>>> interprocess
> >>>
> >>>>>>>>>>>>>>>>>>> communication…"
> >>>
> >>>>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>>>> How about something shorter like
> >>>
> >>>>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>>>> "Apache Arrow is a cross-language development
> >>>
> >>>> platform
> >>>
> >>>>>> for
> >>>
> >>>>>>>>>>>>>>> in-memory
> >>>
> >>>>>>>>>>>>>>>>>>> data. It enables systems to process and transport
> >>>
> >>>> data
> >>>
> >>>>>>> faster."
> >>>
> >>>>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>>>> Suggestions / refinements from others welcome
> >>>
> >>>>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>>>> On Sat, May 15, 2021 at 9:12 PM Dominik Moritz <
> >>>
> >>>>>>> domoritz@cmu.edu
> >>>
> >>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>> wrote:
> >>>
> >>>>>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>>>>> Super minor issue but could someone make the
> >>>
> >>>> description
> >>>
> >>>>>>> on
> >>>
> >>>>>>>>>>>>>>> GitHub
> >>>
> >>>>>>>>>>>>>>>>>>> shorter?
> >>>
> >>>>>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>>>>> GitHub puts the description into the title of the
> >>>
> >>>> page
> >>>
> >>>>>>> and makes
> >>>
> >>>>>>>>>>>>>>> it
> >>>
> >>>>>>>>>>>>>>>>>> hard
> >>>
> >>>>>>>>>>>>>>>>>>> to find it in URL autocomplete.
> >>>
> >>>>>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>>> --
> >>>
> >>>>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>>
> >>>
> >>>>>>>>>>>>
> >>>
> >>>>>>>>>>
> >>>
> >>>>>>>>>>
> >>>
> >>>>>>>
> >>>
> >>>>>>
> >>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Adam Hooper
> >>> +1-514-882-9694
> >>> http://adamhooper.com
> >>>
>

Re: Long title on github page

Posted by Antoine Pitrou <an...@python.org>.
Sound good enough to me.


Le 10/06/2021 à 23:35, Wes McKinney a écrit :
> I hate to reopen this can of worms again, but here is my effort to
> synthesize feedback:
> 
> "Apache Arrow is a multi-language toolbox for accelerated data
> interchange and in-memory processing."
> 
> On Thu, Jun 10, 2021 at 12:37 PM Dominik Moritz <do...@apache.org> wrote:
>>
>> I thought there were some good suggestions in this thread. @Wes, did you
>> find a description you liked?
>>
>> On May 18, 2021 at 06:24:47, Adam Hooper <ad...@adamhooper.com> wrote:
>>
>>> Poll question: why did you choose Arrow?
>>>
>>> Personally: I researched Arrow because it's a spec for IPC. (My requirement
>>> was: "wrap computations in a separate process.") I chose Arrow for its
>>> community and ecosystem -- in other words, because my peers chose it.
>>>
>>> I happen to use the compute kernel and Parquet capabilities every day; but
>>> they did not sway me at all. I would choose Arrow if it were nothing but
>>> this spec and this community. (I chose HTML, after all.)
>>>
>>> I see the *code* as one enormous proof that the *spec* is good, and as a
>>> collection of examples and best practices.
>>>
>>> ... so a great pitch to me would be: "Apache Arrow is a data format and
>>> toolbox for efficient in-memory processing."
>>>
>>> Enjoy life,
>>> Adam
>>>
>>> On Tue, May 18, 2021 at 2:38 AM Aldrin <ak...@ucsc.edu.invalid> wrote:
>>>
>>> "Apache Arrow is a data processing library that also provides a uniform,
>>>
>>> efficient interface for data systems."
>>>
>>>
>>> This probably still isn't quite right, I imagine the bit about "for data
>>>
>>> systems" needs some addition (maybe "for transport between data systems")?
>>>
>>>
>>> My primary motivators:
>>>
>>>
>>>     - "A data processing library":
>>>
>>>        - Arrow provides many language bindings, but ultimately they're all
>>>
>>>        part of the same "library ecosystem", which I think is fine to
>>>
>>> capture in
>>>
>>>        "library"
>>>
>>>        - A main goal of arrow is for processing to be fast, whatever that
>>>
>>>        processing may be
>>>
>>>        - "uniform, efficient interface for data systems":
>>>
>>>        - Arrow, provides (or tries to) a cohesive ("uniform") interface for
>>>
>>>        data processing (although it has several APIs to do this)
>>>
>>>        - Also, IMO, a motivation for arrow was a format and library to
>>>
>>>        facilitate processing, but that provided functions and
>>>
>>> interfaces to easily
>>>
>>>        translate into optimized data formats used by disparate data systems
>>>
>>>        (cassandra, hadoop, etc.).
>>>
>>>        - Arrow tries to be transparently zero-copy, which is part of the
>>>
>>>        interface for efficiency
>>>
>>>     - Arrow certainly has a data format, but that format is the crux of the
>>>
>>>     interface (IMO). However, it also makes using other formats easy (via
>>>
>>>     filesystem API and parquet reader/writers, etc.). So, focusing on the
>>>
>>> data
>>>
>>>     format seems unnecessary in such a terse description.
>>>
>>>
>>>
>>> Aldrin Montana
>>>
>>> Computer Science PhD Student
>>>
>>> UC Santa Cruz
>>>
>>>
>>>
>>> On Mon, May 17, 2021 at 5:07 PM Weston Pace <we...@gmail.com> wrote:
>>>
>>>
>>>> I'd avoid the word "structured" as it is somewhat ill-defined.
>>>
>>>>
>>>
>>>> On Mon, May 17, 2021 at 12:37 PM Mauricio Vargas
>>>
>>>> <ma...@ursacomputing.com> wrote:
>>>
>>>>>
>>>
>>>>> more marketed:
>>>
>>>>> How about: "Apache Arrow is a format and language-agnostic library
>>>
>>>> focused
>>>
>>>>> on efficient sharing and processing of structured data."
>>>
>>>>>
>>>
>>>>> On Mon, May 17, 2021 at 6:25 PM Micah Kornfield <emkornfield@gmail.com
>>>
>>>>
>>>
>>>>> wrote:
>>>
>>>>>
>>>
>>>>>> How about: "Apache Arrow is a collection of specifications, cross
>>>
>>>> language
>>>
>>>>>> libraries and applications focused on efficient sharing and
>>>
>>> processing
>>>
>>>> of
>>>
>>>>>> structured data."
>>>
>>>>>>
>>>
>>>>>> On Mon, May 17, 2021 at 3:06 PM Wes McKinney <we...@gmail.com>
>>>
>>>> wrote:
>>>
>>>>>>
>>>
>>>>>>> On Mon, May 17, 2021 at 4:58 PM Weston Pace <weston.pace@gmail.com
>>>
>>>>
>>>
>>>>>> wrote:
>>>
>>>>>>>>
>>>
>>>>>>>>> “Apache Arrow is a format and compute kernel for in-memory
>>>
>>> data”
>>>
>>>>>>>>
>>>
>>>>>>>> I like this but no one ever knows what "in-memory" means (or they
>>>
>>>> just
>>>
>>>>>>>> think 'data is always in memory').  How about...
>>>
>>>>>>>>
>>>
>>>>>>>> "Apache Arrow is a format and compute kernel for zero-copy
>>>
>>>> processing
>>>
>>>>>>>> and sharing of data."
>>>
>>>>>>>>
>>>
>>>>>>>> or...
>>>
>>>>>>>>
>>>
>>>>>>>> "Apache Arrow is a format and compute kernel for processing and
>>>
>>>>>>>> sharing data without serialization overhead."
>>>
>>>>>>>
>>>
>>>>>>> A few issues with this:
>>>
>>>>>>>
>>>
>>>>>>> * Multiple PL aspect unclear (is a single piece of software, or
>>>
>>>>>>> multiple pieces of software?)
>>>
>>>>>>> * Development platform aspect unclear
>>>
>>>>>>>
>>>
>>>>>>> I see that some people don't like the word "platform". Some people
>>>
>>>>>>> come to this project and want to find an end-to-end application,
>>>
>>>>>>> rather than a developer toolkit that they can use to build
>>>
>>>>>>> applications. Perhaps we should be more explicit and use
>>>
>>>>>>> "computational development toolkit" instead of "platform".
>>>
>>>>>>>
>>>
>>>>>>>> Although marshalling[1] would probably be a more precise word it
>>>
>>> is
>>>
>>>>>>>> not as well known.
>>>
>>>>>>>>
>>>
>>>>>>>> [1] https://en.wikipedia.org/wiki/Marshalling_(computer_science)
>>>
>>>>>>>>
>>>
>>>>>>>> On Mon, May 17, 2021 at 9:36 AM Mauricio Vargas
>>>
>>>>>>>> <ma...@ursacomputing.com> wrote:
>>>
>>>>>>>>>
>>>
>>>>>>>>> a few ideas
>>>
>>>>>>>>>
>>>
>>>>>>>>> github.com/apache/arrow - Apache Arrow is an efficient library
>>>
>>>> for
>>>
>>>>>>> big data
>>>
>>>>>>>>> processing and sharing
>>>
>>>>>>>>>
>>>
>>>>>>>>> github.com/apache/arrow - Apache Arrow is a computational tool
>>>
>>>> for
>>>
>>>>>>>>> processing, storing and sharing large datasets
>>>
>>>>>>>>>
>>>
>>>>>>>>> github.com/apache/arrow - Apache Arrow is a  fast and simple
>>>
>>>> library
>>>
>>>>>>> for
>>>
>>>>>>>>> big data analytics
>>>
>>>>>>>>>
>>>
>>>>>>>>> *github.com/apache/arrow <http://github.com/apache/arrow> -
>>>
>>>> Apache
>>>
>>>>>>> Arrow is
>>>
>>>>>>>>> a powerful workhorse for analytic operations on modern
>>>
>>> hardware*
>>>
>>>>>>>>>
>>>
>>>>>>>>>
>>>
>>>>>>>>> On Mon, May 17, 2021 at 3:13 PM Julian Hyde <
>>>
>>>> jhyde.apache@gmail.com>
>>>
>>>>>>> wrote:
>>>
>>>>>>>>>
>>>
>>>>>>>>>> Alright, well, whatever it is, it must fit into one breath.
>>>
>>> If
>>>
>>>> the
>>>
>>>>>>>>>> high-concept pitch is successful, people will stick around
>>>
>>> for
>>>
>>>> the
>>>
>>>>>>> full
>>>
>>>>>>>>>> pitch.
>>>
>>>>>>>>>>
>>>
>>>>>>>>>> Words such as “platform” and “enable” are noise. You say
>>>
>>>>>> “platform”,
>>>
>>>>>>> they
>>>
>>>>>>>>>> start to say “what exactly do you mean by platform”, the
>>>
>>>> elevator
>>>
>>>>>>> doors
>>>
>>>>>>>>>> open, and they’re gone.
>>>
>>>>>>>>>>
>>>
>>>>>>>>>> “Apache Arrow is a format and compute kernel for in-memory
>>>
>>>> data”
>>>
>>>>>>>>>>
>>>
>>>>>>>>>>
>>>
>>>>>>>>>>> On May 17, 2021, at 12:03 PM, Eduardo Ponce <
>>>
>>>> edponce00@gmail.com
>>>
>>>>>>>
>>>
>>>>>>> wrote:
>>>
>>>>>>>>>>>
>>>
>>>>>>>>>>> One more suggestion for the bucket:
>>>
>>>>>>>>>>> "Apache Arrow is a computational platform for efficient
>>>
>>>> in-memory
>>>
>>>>>>> data
>>>
>>>>>>>>>>> representation and processing."
>>>
>>>>>>>>>>>
>>>
>>>>>>>>>>> On Mon, May 17, 2021 at 2:49 PM Wes McKinney <
>>>
>>>>>> wesmckinn@gmail.com>
>>>
>>>>>>>>>> wrote:
>>>
>>>>>>>>>>>
>>>
>>>>>>>>>>>> I think less is better in the description, but
>>>
>>>> unfortunately the
>>>
>>>>>>>>>>>> association of Arrow as being "just a data format" has
>>>
>>> been
>>>
>>>>>>> actively
>>>
>>>>>>>>>>>> harmful in some ways to community growth. We have a data
>>>
>>>> format,
>>>
>>>>>>> yes,
>>>
>>>>>>>>>>>> but we are also creating a computational platform to go
>>>
>>>>>>> hand-in-hand
>>>
>>>>>>>>>>>> with the data format to make it easier to build fast
>>>
>>>>>> applications
>>>
>>>>>>> that
>>>
>>>>>>>>>>>> use the data format. So the description needs to capture
>>>
>>>> both of
>>>
>>>>>>> these
>>>
>>>>>>>>>>>> ideas.
>>>
>>>>>>>>>>>>
>>>
>>>>>>>>>>>> On Mon, May 17, 2021 at 12:15 PM Julian Hyde <
>>>
>>>>>>> jhyde.apache@gmail.com>
>>>
>>>>>>>>>>>> wrote:
>>>
>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>> I think that the “cross-language development platform
>>>
>>> for”
>>>
>>>> is
>>>
>>>>>>> noise.
>>>
>>>>>>>>>>>> (I’m sure that JPEG developers think that JPEG is a
>>>
>>>>>>> “cross-language
>>>
>>>>>>>>>>>> development platform” too. But it isn’t. It is an image
>>>
>>>> format.)
>>>
>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>> "Apache Arrow is data format for efficient in-memory
>>>
>>>>>> processing.”
>>>
>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>> I’ll note that In marketing speak, we are developing a
>>>
>>>>>>> high-concept
>>>
>>>>>>>>>>>> pitch [1] here. Every company needs a name, a brand, a
>>>
>>>>>>> high-concept
>>>
>>>>>>>>>> pitch,
>>>
>>>>>>>>>>>> and 3- or 4-sentence description. But every Apache project
>>>
>>>> needs
>>>
>>>>>>> these
>>>
>>>>>>>>>> too.
>>>
>>>>>>>>>>>> It’s worth spending the time on the description, also, and
>>>
>>>> then
>>>
>>>>>>> use
>>>
>>>>>>>>>> them in
>>>
>>>>>>>>>>>> all the places that we describe Arrow.
>>>
>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>> Julian
>>>
>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>> [1]
>>>
>>>>>>> https://www.growthink.com/content/whats-your-high-concept-pitch
>>>
>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>> On May 17, 2021, at 7:38 AM, Eduardo Ponce <
>>>
>>>>>> edponce00@gmail.com
>>>
>>>>>>>>
>>>
>>>>>>>>>>>> wrote:
>>>
>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>> I agree with Nate's and Brian's suggestions, but would
>>>
>>>> like to
>>>
>>>>>>> add
>>>
>>>>>>>>>>>> that we
>>>
>>>>>>>>>>>>>> can make it a one-liner for more conciseness and
>>>
>>>> consistency
>>>
>>>>>>> with
>>>
>>>>>>>>>> other
>>>
>>>>>>>>>>>>>> Apache projects.
>>>
>>>>>>>>>>>>>> Apologies if it seems I am going around the suggestions
>>>
>>>> loop
>>>
>>>>>>> again.
>>>
>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>> "Apache Arrow is a cross-language development platform
>>>
>>>>>> enabling
>>>
>>>>>>>>>>>> efficient
>>>
>>>>>>>>>>>>>> in-memory data processing and transport."
>>>
>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>> On Mon, May 17, 2021 at 10:11 AM Brian Hulette <
>>>
>>>>>>> bhulette@apache.org>
>>>
>>>>>>>>>>>> wrote:
>>>
>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>> Thank you for bringing this up Dominik. I sampled some
>>>
>>>> of the
>>>
>>>>>>>>>>>> descriptions
>>>
>>>>>>>>>>>>>>> for other Apache projects I frequent, the ones with a
>>>
>>>>>>> meaningful
>>>
>>>>>>>>>>>>>>> description have a single sentence:
>>>
>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>> github.com/apache/spark - Apache Spark - A unified
>>>
>>>> analytics
>>>
>>>>>>> engine
>>>
>>>>>>>>>>>> for
>>>
>>>>>>>>>>>>>>> large-scale data processing
>>>
>>>>>>>>>>>>>>> github.com/apache/beam - Apache Beam is a unified
>>>
>>>>>> programming
>>>
>>>>>>> model
>>>
>>>>>>>>>>>> for
>>>
>>>>>>>>>>>>>>> Batch and Streaming
>>>
>>>>>>>>>>>>>>> github.com/apache/avro - Apache Avro is a data
>>>
>>>> serialization
>>>
>>>>>>> system
>>>
>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>> Several others (Flink, Hadoop, ...) just have  "[Mirror
>>>
>>>> of]
>>>
>>>>>>> Apache
>>>
>>>>>>>>>>>> <name>"
>>>
>>>>>>>>>>>>>>> as the description.
>>>
>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>> +1 for Nate's suggestion "Apache Arrow is a
>>>
>>>> cross-language
>>>
>>>>>>>>>> development
>>>
>>>>>>>>>>>>>>> platform for in-memory data. It enables systems to
>>>
>>>> process
>>>
>>>>>> and
>>>
>>>>>>>>>>>> transport
>>>
>>>>>>>>>>>>>>> data more efficiently."
>>>
>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>> On Mon, May 17, 2021 at 5:23 AM Wes McKinney <
>>>
>>>>>>> wesmckinn@gmail.com>
>>>
>>>>>>>>>>>> wrote:
>>>
>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>> It's probably best for description to limit mentions
>>>
>>> of
>>>
>>>>>>> specific
>>>
>>>>>>>>>>>>>>>> features. There are some high level features mentioned
>>>
>>>> in
>>>
>>>>>> the
>>>
>>>>>>>>>>>>>>>> description now ("computational libraries and
>>>
>>> zero-copy
>>>
>>>>>>> streaming
>>>
>>>>>>>>>>>>>>>> messaging and interprocess communication"), but now in
>>>
>>>> 2021
>>>
>>>>>>> since
>>>
>>>>>>>>>> the
>>>
>>>>>>>>>>>>>>>> project has grown so much, it could leave people with
>>>
>>> a
>>>
>>>>>>> limited view
>>>
>>>>>>>>>>>>>>>> of what they might find here.
>>>
>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>> On Mon, May 17, 2021 at 12:14 AM Mauricio Vargas
>>>
>>>>>>>>>>>>>>>> <ma...@ursacomputing.com> wrote:
>>>
>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>> How about
>>>
>>>>>>>>>>>>>>>>> 'Apache Arrow is a cross-language development
>>>
>>> platform
>>>
>>>> for
>>>
>>>>>>>>>> in-memory
>>>
>>>>>>>>>>>>>>>> data.
>>>
>>>>>>>>>>>>>>>>> It enables systems to process and transport data
>>>
>>>>>> efficiently,
>>>
>>>>>>>>>>>>>>> providing a
>>>
>>>>>>>>>>>>>>>>> simple and fast library for partitioning of large
>>>
>>>> tables'?
>>>
>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>> Sorry the delay, long election day
>>>
>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>> On Sun, May 16, 2021, 2:27 PM Nate Bauernfeind <
>>>
>>>>>>>>>>>>>>>> natebauernfeind@deephaven.io>
>>>
>>>>>>>>>>>>>>>>> wrote:
>>>
>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>>> Suggestion: faster -> more efficiently
>>>
>>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>>> "Apache Arrow is a cross-language development
>>>
>>>> platform for
>>>
>>>>>>>>>>>> in-memory
>>>
>>>>>>>>>>>>>>>>>> data. It enables systems to process and transport
>>>
>>> data
>>>
>>>>>> more
>>>
>>>>>>>>>>>>>>>> efficiently."
>>>
>>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>>> On Sun, May 16, 2021 at 11:35 AM Wes McKinney <
>>>
>>>>>>>>>> wesmckinn@gmail.com
>>>
>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>> wrote:
>>>
>>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>>>> Here's what there now:
>>>
>>>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>>>> "Apache Arrow is a cross-language development
>>>
>>>> platform
>>>
>>>>>> for
>>>
>>>>>>>>>>>>>>> in-memory
>>>
>>>>>>>>>>>>>>>>>>> data. It specifies a standardized
>>>
>>>> language-independent
>>>
>>>>>>> columnar
>>>
>>>>>>>>>>>>>>>> memory
>>>
>>>>>>>>>>>>>>>>>>> format for flat and hierarchical data, organized
>>>
>>> for
>>>
>>>>>>> efficient
>>>
>>>>>>>>>>>>>>>>>>> analytic operations on modern hardware. It also
>>>
>>>> provides
>>>
>>>>>>>>>>>>>>>> computational
>>>
>>>>>>>>>>>>>>>>>>> libraries and zero-copy streaming messaging and
>>>
>>>>>>> interprocess
>>>
>>>>>>>>>>>>>>>>>>> communication…"
>>>
>>>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>>>> How about something shorter like
>>>
>>>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>>>> "Apache Arrow is a cross-language development
>>>
>>>> platform
>>>
>>>>>> for
>>>
>>>>>>>>>>>>>>> in-memory
>>>
>>>>>>>>>>>>>>>>>>> data. It enables systems to process and transport
>>>
>>>> data
>>>
>>>>>>> faster."
>>>
>>>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>>>> Suggestions / refinements from others welcome
>>>
>>>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>>>> On Sat, May 15, 2021 at 9:12 PM Dominik Moritz <
>>>
>>>>>>> domoritz@cmu.edu
>>>
>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>> wrote:
>>>
>>>>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>>>>> Super minor issue but could someone make the
>>>
>>>> description
>>>
>>>>>>> on
>>>
>>>>>>>>>>>>>>> GitHub
>>>
>>>>>>>>>>>>>>>>>>> shorter?
>>>
>>>>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>>>>> GitHub puts the description into the title of the
>>>
>>>> page
>>>
>>>>>>> and makes
>>>
>>>>>>>>>>>>>>> it
>>>
>>>>>>>>>>>>>>>>>> hard
>>>
>>>>>>>>>>>>>>>>>>> to find it in URL autocomplete.
>>>
>>>>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>>> --
>>>
>>>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>>
>>>
>>>>>>>>>>>>
>>>
>>>>>>>>>>
>>>
>>>>>>>>>>
>>>
>>>>>>>
>>>
>>>>>>
>>>
>>>>
>>>
>>>
>>>
>>>
>>> --
>>> Adam Hooper
>>> +1-514-882-9694
>>> http://adamhooper.com
>>>

Re: Long title on github page

Posted by Wes McKinney <we...@gmail.com>.
I hate to reopen this can of worms again, but here is my effort to
synthesize feedback:

"Apache Arrow is a multi-language toolbox for accelerated data
interchange and in-memory processing."

On Thu, Jun 10, 2021 at 12:37 PM Dominik Moritz <do...@apache.org> wrote:
>
> I thought there were some good suggestions in this thread. @Wes, did you
> find a description you liked?
>
> On May 18, 2021 at 06:24:47, Adam Hooper <ad...@adamhooper.com> wrote:
>
> > Poll question: why did you choose Arrow?
> >
> > Personally: I researched Arrow because it's a spec for IPC. (My requirement
> > was: "wrap computations in a separate process.") I chose Arrow for its
> > community and ecosystem -- in other words, because my peers chose it.
> >
> > I happen to use the compute kernel and Parquet capabilities every day; but
> > they did not sway me at all. I would choose Arrow if it were nothing but
> > this spec and this community. (I chose HTML, after all.)
> >
> > I see the *code* as one enormous proof that the *spec* is good, and as a
> > collection of examples and best practices.
> >
> > ... so a great pitch to me would be: "Apache Arrow is a data format and
> > toolbox for efficient in-memory processing."
> >
> > Enjoy life,
> > Adam
> >
> > On Tue, May 18, 2021 at 2:38 AM Aldrin <ak...@ucsc.edu.invalid> wrote:
> >
> > "Apache Arrow is a data processing library that also provides a uniform,
> >
> > efficient interface for data systems."
> >
> >
> > This probably still isn't quite right, I imagine the bit about "for data
> >
> > systems" needs some addition (maybe "for transport between data systems")?
> >
> >
> > My primary motivators:
> >
> >
> >    - "A data processing library":
> >
> >       - Arrow provides many language bindings, but ultimately they're all
> >
> >       part of the same "library ecosystem", which I think is fine to
> >
> > capture in
> >
> >       "library"
> >
> >       - A main goal of arrow is for processing to be fast, whatever that
> >
> >       processing may be
> >
> >       - "uniform, efficient interface for data systems":
> >
> >       - Arrow, provides (or tries to) a cohesive ("uniform") interface for
> >
> >       data processing (although it has several APIs to do this)
> >
> >       - Also, IMO, a motivation for arrow was a format and library to
> >
> >       facilitate processing, but that provided functions and
> >
> > interfaces to easily
> >
> >       translate into optimized data formats used by disparate data systems
> >
> >       (cassandra, hadoop, etc.).
> >
> >       - Arrow tries to be transparently zero-copy, which is part of the
> >
> >       interface for efficiency
> >
> >    - Arrow certainly has a data format, but that format is the crux of the
> >
> >    interface (IMO). However, it also makes using other formats easy (via
> >
> >    filesystem API and parquet reader/writers, etc.). So, focusing on the
> >
> > data
> >
> >    format seems unnecessary in such a terse description.
> >
> >
> >
> > Aldrin Montana
> >
> > Computer Science PhD Student
> >
> > UC Santa Cruz
> >
> >
> >
> > On Mon, May 17, 2021 at 5:07 PM Weston Pace <we...@gmail.com> wrote:
> >
> >
> > > I'd avoid the word "structured" as it is somewhat ill-defined.
> >
> > >
> >
> > > On Mon, May 17, 2021 at 12:37 PM Mauricio Vargas
> >
> > > <ma...@ursacomputing.com> wrote:
> >
> > > >
> >
> > > > more marketed:
> >
> > > > How about: "Apache Arrow is a format and language-agnostic library
> >
> > > focused
> >
> > > > on efficient sharing and processing of structured data."
> >
> > > >
> >
> > > > On Mon, May 17, 2021 at 6:25 PM Micah Kornfield <emkornfield@gmail.com
> >
> > >
> >
> > > > wrote:
> >
> > > >
> >
> > > > > How about: "Apache Arrow is a collection of specifications, cross
> >
> > > language
> >
> > > > > libraries and applications focused on efficient sharing and
> >
> > processing
> >
> > > of
> >
> > > > > structured data."
> >
> > > > >
> >
> > > > > On Mon, May 17, 2021 at 3:06 PM Wes McKinney <we...@gmail.com>
> >
> > > wrote:
> >
> > > > >
> >
> > > > > > On Mon, May 17, 2021 at 4:58 PM Weston Pace <weston.pace@gmail.com
> >
> > >
> >
> > > > > wrote:
> >
> > > > > > >
> >
> > > > > > > > “Apache Arrow is a format and compute kernel for in-memory
> >
> > data”
> >
> > > > > > >
> >
> > > > > > > I like this but no one ever knows what "in-memory" means (or they
> >
> > > just
> >
> > > > > > > think 'data is always in memory').  How about...
> >
> > > > > > >
> >
> > > > > > > "Apache Arrow is a format and compute kernel for zero-copy
> >
> > > processing
> >
> > > > > > > and sharing of data."
> >
> > > > > > >
> >
> > > > > > > or...
> >
> > > > > > >
> >
> > > > > > > "Apache Arrow is a format and compute kernel for processing and
> >
> > > > > > > sharing data without serialization overhead."
> >
> > > > > >
> >
> > > > > > A few issues with this:
> >
> > > > > >
> >
> > > > > > * Multiple PL aspect unclear (is a single piece of software, or
> >
> > > > > > multiple pieces of software?)
> >
> > > > > > * Development platform aspect unclear
> >
> > > > > >
> >
> > > > > > I see that some people don't like the word "platform". Some people
> >
> > > > > > come to this project and want to find an end-to-end application,
> >
> > > > > > rather than a developer toolkit that they can use to build
> >
> > > > > > applications. Perhaps we should be more explicit and use
> >
> > > > > > "computational development toolkit" instead of "platform".
> >
> > > > > >
> >
> > > > > > > Although marshalling[1] would probably be a more precise word it
> >
> > is
> >
> > > > > > > not as well known.
> >
> > > > > > >
> >
> > > > > > > [1] https://en.wikipedia.org/wiki/Marshalling_(computer_science)
> >
> > > > > > >
> >
> > > > > > > On Mon, May 17, 2021 at 9:36 AM Mauricio Vargas
> >
> > > > > > > <ma...@ursacomputing.com> wrote:
> >
> > > > > > > >
> >
> > > > > > > > a few ideas
> >
> > > > > > > >
> >
> > > > > > > > github.com/apache/arrow - Apache Arrow is an efficient library
> >
> > > for
> >
> > > > > > big data
> >
> > > > > > > > processing and sharing
> >
> > > > > > > >
> >
> > > > > > > > github.com/apache/arrow - Apache Arrow is a computational tool
> >
> > > for
> >
> > > > > > > > processing, storing and sharing large datasets
> >
> > > > > > > >
> >
> > > > > > > > github.com/apache/arrow - Apache Arrow is a  fast and simple
> >
> > > library
> >
> > > > > > for
> >
> > > > > > > > big data analytics
> >
> > > > > > > >
> >
> > > > > > > > *github.com/apache/arrow <http://github.com/apache/arrow> -
> >
> > > Apache
> >
> > > > > > Arrow is
> >
> > > > > > > > a powerful workhorse for analytic operations on modern
> >
> > hardware*
> >
> > > > > > > >
> >
> > > > > > > >
> >
> > > > > > > > On Mon, May 17, 2021 at 3:13 PM Julian Hyde <
> >
> > > jhyde.apache@gmail.com>
> >
> > > > > > wrote:
> >
> > > > > > > >
> >
> > > > > > > > > Alright, well, whatever it is, it must fit into one breath.
> >
> > If
> >
> > > the
> >
> > > > > > > > > high-concept pitch is successful, people will stick around
> >
> > for
> >
> > > the
> >
> > > > > > full
> >
> > > > > > > > > pitch.
> >
> > > > > > > > >
> >
> > > > > > > > > Words such as “platform” and “enable” are noise. You say
> >
> > > > > “platform”,
> >
> > > > > > they
> >
> > > > > > > > > start to say “what exactly do you mean by platform”, the
> >
> > > elevator
> >
> > > > > > doors
> >
> > > > > > > > > open, and they’re gone.
> >
> > > > > > > > >
> >
> > > > > > > > > “Apache Arrow is a format and compute kernel for in-memory
> >
> > > data”
> >
> > > > > > > > >
> >
> > > > > > > > >
> >
> > > > > > > > > > On May 17, 2021, at 12:03 PM, Eduardo Ponce <
> >
> > > edponce00@gmail.com
> >
> > > > > >
> >
> > > > > > wrote:
> >
> > > > > > > > > >
> >
> > > > > > > > > > One more suggestion for the bucket:
> >
> > > > > > > > > > "Apache Arrow is a computational platform for efficient
> >
> > > in-memory
> >
> > > > > > data
> >
> > > > > > > > > > representation and processing."
> >
> > > > > > > > > >
> >
> > > > > > > > > > On Mon, May 17, 2021 at 2:49 PM Wes McKinney <
> >
> > > > > wesmckinn@gmail.com>
> >
> > > > > > > > > wrote:
> >
> > > > > > > > > >
> >
> > > > > > > > > >> I think less is better in the description, but
> >
> > > unfortunately the
> >
> > > > > > > > > >> association of Arrow as being "just a data format" has
> >
> > been
> >
> > > > > > actively
> >
> > > > > > > > > >> harmful in some ways to community growth. We have a data
> >
> > > format,
> >
> > > > > > yes,
> >
> > > > > > > > > >> but we are also creating a computational platform to go
> >
> > > > > > hand-in-hand
> >
> > > > > > > > > >> with the data format to make it easier to build fast
> >
> > > > > applications
> >
> > > > > > that
> >
> > > > > > > > > >> use the data format. So the description needs to capture
> >
> > > both of
> >
> > > > > > these
> >
> > > > > > > > > >> ideas.
> >
> > > > > > > > > >>
> >
> > > > > > > > > >> On Mon, May 17, 2021 at 12:15 PM Julian Hyde <
> >
> > > > > > jhyde.apache@gmail.com>
> >
> > > > > > > > > >> wrote:
> >
> > > > > > > > > >>>
> >
> > > > > > > > > >>> I think that the “cross-language development platform
> >
> > for”
> >
> > > is
> >
> > > > > > noise.
> >
> > > > > > > > > >> (I’m sure that JPEG developers think that JPEG is a
> >
> > > > > > “cross-language
> >
> > > > > > > > > >> development platform” too. But it isn’t. It is an image
> >
> > > format.)
> >
> > > > > > > > > >>>
> >
> > > > > > > > > >>> "Apache Arrow is data format for efficient in-memory
> >
> > > > > processing.”
> >
> > > > > > > > > >>>
> >
> > > > > > > > > >>> I’ll note that In marketing speak, we are developing a
> >
> > > > > > high-concept
> >
> > > > > > > > > >> pitch [1] here. Every company needs a name, a brand, a
> >
> > > > > > high-concept
> >
> > > > > > > > > pitch,
> >
> > > > > > > > > >> and 3- or 4-sentence description. But every Apache project
> >
> > > needs
> >
> > > > > > these
> >
> > > > > > > > > too.
> >
> > > > > > > > > >> It’s worth spending the time on the description, also, and
> >
> > > then
> >
> > > > > > use
> >
> > > > > > > > > them in
> >
> > > > > > > > > >> all the places that we describe Arrow.
> >
> > > > > > > > > >>>
> >
> > > > > > > > > >>> Julian
> >
> > > > > > > > > >>>
> >
> > > > > > > > > >>> [1]
> >
> > > > > > https://www.growthink.com/content/whats-your-high-concept-pitch
> >
> > > > > > > > > >>>
> >
> > > > > > > > > >>>
> >
> > > > > > > > > >>>
> >
> > > > > > > > > >>>> On May 17, 2021, at 7:38 AM, Eduardo Ponce <
> >
> > > > > edponce00@gmail.com
> >
> > > > > > >
> >
> > > > > > > > > >> wrote:
> >
> > > > > > > > > >>>>
> >
> > > > > > > > > >>>> I agree with Nate's and Brian's suggestions, but would
> >
> > > like to
> >
> > > > > > add
> >
> > > > > > > > > >> that we
> >
> > > > > > > > > >>>> can make it a one-liner for more conciseness and
> >
> > > consistency
> >
> > > > > > with
> >
> > > > > > > > > other
> >
> > > > > > > > > >>>> Apache projects.
> >
> > > > > > > > > >>>> Apologies if it seems I am going around the suggestions
> >
> > > loop
> >
> > > > > > again.
> >
> > > > > > > > > >>>>
> >
> > > > > > > > > >>>> "Apache Arrow is a cross-language development platform
> >
> > > > > enabling
> >
> > > > > > > > > >> efficient
> >
> > > > > > > > > >>>> in-memory data processing and transport."
> >
> > > > > > > > > >>>>
> >
> > > > > > > > > >>>>
> >
> > > > > > > > > >>>>
> >
> > > > > > > > > >>>>
> >
> > > > > > > > > >>>> On Mon, May 17, 2021 at 10:11 AM Brian Hulette <
> >
> > > > > > bhulette@apache.org>
> >
> > > > > > > > > >> wrote:
> >
> > > > > > > > > >>>>
> >
> > > > > > > > > >>>>> Thank you for bringing this up Dominik. I sampled some
> >
> > > of the
> >
> > > > > > > > > >> descriptions
> >
> > > > > > > > > >>>>> for other Apache projects I frequent, the ones with a
> >
> > > > > > meaningful
> >
> > > > > > > > > >>>>> description have a single sentence:
> >
> > > > > > > > > >>>>>
> >
> > > > > > > > > >>>>> github.com/apache/spark - Apache Spark - A unified
> >
> > > analytics
> >
> > > > > > engine
> >
> > > > > > > > > >> for
> >
> > > > > > > > > >>>>> large-scale data processing
> >
> > > > > > > > > >>>>> github.com/apache/beam - Apache Beam is a unified
> >
> > > > > programming
> >
> > > > > > model
> >
> > > > > > > > > >> for
> >
> > > > > > > > > >>>>> Batch and Streaming
> >
> > > > > > > > > >>>>> github.com/apache/avro - Apache Avro is a data
> >
> > > serialization
> >
> > > > > > system
> >
> > > > > > > > > >>>>>
> >
> > > > > > > > > >>>>> Several others (Flink, Hadoop, ...) just have  "[Mirror
> >
> > > of]
> >
> > > > > > Apache
> >
> > > > > > > > > >> <name>"
> >
> > > > > > > > > >>>>> as the description.
> >
> > > > > > > > > >>>>>
> >
> > > > > > > > > >>>>> +1 for Nate's suggestion "Apache Arrow is a
> >
> > > cross-language
> >
> > > > > > > > > development
> >
> > > > > > > > > >>>>> platform for in-memory data. It enables systems to
> >
> > > process
> >
> > > > > and
> >
> > > > > > > > > >> transport
> >
> > > > > > > > > >>>>> data more efficiently."
> >
> > > > > > > > > >>>>>
> >
> > > > > > > > > >>>>> On Mon, May 17, 2021 at 5:23 AM Wes McKinney <
> >
> > > > > > wesmckinn@gmail.com>
> >
> > > > > > > > > >> wrote:
> >
> > > > > > > > > >>>>>
> >
> > > > > > > > > >>>>>> It's probably best for description to limit mentions
> >
> > of
> >
> > > > > > specific
> >
> > > > > > > > > >>>>>> features. There are some high level features mentioned
> >
> > > in
> >
> > > > > the
> >
> > > > > > > > > >>>>>> description now ("computational libraries and
> >
> > zero-copy
> >
> > > > > > streaming
> >
> > > > > > > > > >>>>>> messaging and interprocess communication"), but now in
> >
> > > 2021
> >
> > > > > > since
> >
> > > > > > > > > the
> >
> > > > > > > > > >>>>>> project has grown so much, it could leave people with
> >
> > a
> >
> > > > > > limited view
> >
> > > > > > > > > >>>>>> of what they might find here.
> >
> > > > > > > > > >>>>>>
> >
> > > > > > > > > >>>>>> On Mon, May 17, 2021 at 12:14 AM Mauricio Vargas
> >
> > > > > > > > > >>>>>> <ma...@ursacomputing.com> wrote:
> >
> > > > > > > > > >>>>>>>
> >
> > > > > > > > > >>>>>>> How about
> >
> > > > > > > > > >>>>>>> 'Apache Arrow is a cross-language development
> >
> > platform
> >
> > > for
> >
> > > > > > > > > in-memory
> >
> > > > > > > > > >>>>>> data.
> >
> > > > > > > > > >>>>>>> It enables systems to process and transport data
> >
> > > > > efficiently,
> >
> > > > > > > > > >>>>> providing a
> >
> > > > > > > > > >>>>>>> simple and fast library for partitioning of large
> >
> > > tables'?
> >
> > > > > > > > > >>>>>>>
> >
> > > > > > > > > >>>>>>> Sorry the delay, long election day
> >
> > > > > > > > > >>>>>>>
> >
> > > > > > > > > >>>>>>> On Sun, May 16, 2021, 2:27 PM Nate Bauernfeind <
> >
> > > > > > > > > >>>>>> natebauernfeind@deephaven.io>
> >
> > > > > > > > > >>>>>>> wrote:
> >
> > > > > > > > > >>>>>>>
> >
> > > > > > > > > >>>>>>>> Suggestion: faster -> more efficiently
> >
> > > > > > > > > >>>>>>>>
> >
> > > > > > > > > >>>>>>>> "Apache Arrow is a cross-language development
> >
> > > platform for
> >
> > > > > > > > > >> in-memory
> >
> > > > > > > > > >>>>>>>> data. It enables systems to process and transport
> >
> > data
> >
> > > > > more
> >
> > > > > > > > > >>>>>> efficiently."
> >
> > > > > > > > > >>>>>>>>
> >
> > > > > > > > > >>>>>>>> On Sun, May 16, 2021 at 11:35 AM Wes McKinney <
> >
> > > > > > > > > wesmckinn@gmail.com
> >
> > > > > > > > > >>>
> >
> > > > > > > > > >>>>>> wrote:
> >
> > > > > > > > > >>>>>>>>
> >
> > > > > > > > > >>>>>>>>> Here's what there now:
> >
> > > > > > > > > >>>>>>>>>
> >
> > > > > > > > > >>>>>>>>> "Apache Arrow is a cross-language development
> >
> > > platform
> >
> > > > > for
> >
> > > > > > > > > >>>>> in-memory
> >
> > > > > > > > > >>>>>>>>> data. It specifies a standardized
> >
> > > language-independent
> >
> > > > > > columnar
> >
> > > > > > > > > >>>>>> memory
> >
> > > > > > > > > >>>>>>>>> format for flat and hierarchical data, organized
> >
> > for
> >
> > > > > > efficient
> >
> > > > > > > > > >>>>>>>>> analytic operations on modern hardware. It also
> >
> > > provides
> >
> > > > > > > > > >>>>>> computational
> >
> > > > > > > > > >>>>>>>>> libraries and zero-copy streaming messaging and
> >
> > > > > > interprocess
> >
> > > > > > > > > >>>>>>>>> communication…"
> >
> > > > > > > > > >>>>>>>>>
> >
> > > > > > > > > >>>>>>>>> How about something shorter like
> >
> > > > > > > > > >>>>>>>>>
> >
> > > > > > > > > >>>>>>>>> "Apache Arrow is a cross-language development
> >
> > > platform
> >
> > > > > for
> >
> > > > > > > > > >>>>> in-memory
> >
> > > > > > > > > >>>>>>>>> data. It enables systems to process and transport
> >
> > > data
> >
> > > > > > faster."
> >
> > > > > > > > > >>>>>>>>>
> >
> > > > > > > > > >>>>>>>>> Suggestions / refinements from others welcome
> >
> > > > > > > > > >>>>>>>>>
> >
> > > > > > > > > >>>>>>>>>
> >
> > > > > > > > > >>>>>>>>> On Sat, May 15, 2021 at 9:12 PM Dominik Moritz <
> >
> > > > > > domoritz@cmu.edu
> >
> > > > > > > > > >
> >
> > > > > > > > > >>>>>> wrote:
> >
> > > > > > > > > >>>>>>>>>>
> >
> > > > > > > > > >>>>>>>>>> Super minor issue but could someone make the
> >
> > > description
> >
> > > > > > on
> >
> > > > > > > > > >>>>> GitHub
> >
> > > > > > > > > >>>>>>>>> shorter?
> >
> > > > > > > > > >>>>>>>>>>
> >
> > > > > > > > > >>>>>>>>>>
> >
> > > > > > > > > >>>>>>>>>>
> >
> > > > > > > > > >>>>>>>>>> GitHub puts the description into the title of the
> >
> > > page
> >
> > > > > > and makes
> >
> > > > > > > > > >>>>> it
> >
> > > > > > > > > >>>>>>>> hard
> >
> > > > > > > > > >>>>>>>>> to find it in URL autocomplete.
> >
> > > > > > > > > >>>>>>>>>>
> >
> > > > > > > > > >>>>>>>>>
> >
> > > > > > > > > >>>>>>>>
> >
> > > > > > > > > >>>>>>>>
> >
> > > > > > > > > >>>>>>>> --
> >
> > > > > > > > > >>>>>>>>
> >
> > > > > > > > > >>>>>>
> >
> > > > > > > > > >>>>>
> >
> > > > > > > > > >>>
> >
> > > > > > > > > >>
> >
> > > > > > > > >
> >
> > > > > > > > >
> >
> > > > > >
> >
> > > > >
> >
> > >
> >
> >
> >
> >
> > --
> > Adam Hooper
> > +1-514-882-9694
> > http://adamhooper.com
> >