You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/28 22:59:34 UTC

[GitHub] [arrow-site] nealrichardson opened a new pull request #63: Revamp website for 1.0 release

nealrichardson opened a new pull request #63:
URL: https://github.com/apache/arrow-site/pull/63


   I've started a revision of the Arrow website to coincide with our 1.0 release. This is very much a WIP. There are a number of old JIRAs I'll fold into this, and some new ones I'll add too.
   
   The primary objective of the revision is to orient the website at a different audience. When the site was created, Arrow was just an idea, and the focus of the homepage seemed to be around convincing (a very small subset of) people that designing a standard columnar memory format was something they should participate in. That now exists--and so much more does too. 
   
   We need our website to do different things now:
   
   * Tell new/prospective users who've heard of Arrow what it can do for them and how they can get started using it
   * Persuade projects/maintainers that Arrow is the standard they should use, and that they should participate in our community rather than roll their own
   
   I did a quick survey of Apache project websites for comparison. Most are like our current site--clearly a basic Bootstrap theme and pretty neglected--and some are worse (older than Bootstrap). Of them, Spark and Kudu have the most good features we can learn from.
   
   At some point it would be interesting to engage an actual web designer, if possible (I can explore that). But even without that, we need to get our messaging worked out. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on a change in pull request #63: ARROW-9335: [Website] Update website for 1.0

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r460204935



##########
File path: faq.md
##########
@@ -24,32 +24,160 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Flight RPC), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+Some implementations of Arrow are more complete and more stable than others.
+We refer you to the [implementation matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).

Review comment:
       I've done this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#issuecomment-646097171


   > Perhaps the Getting started and Install pages should be per-implementation, rather than global. The current draft for the Getting started page is just a bunch of links to other parts of the documentation.
   
   Maybe so, or maybe these pages could have a tabbed view such that you click on "Python" and just see the Python guide. I'm open to suggestion. My goal is to enable someone unfamiliar with the project to be able to go to the Arrow website and quickly see how to download and start using Arrow. 
   
   The bigger challenges are (1) the general lack of getting-started user guides in the project and (2) the ones we do have are scattered across each language's docs and readmes. 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r449257551



##########
File path: index.html
##########
@@ -1,72 +1,62 @@
 ---
-layout: default
+layout: home
 ---
-<div class="jumbotron">
-    <h1>Apache Arrow</h1>
-    <p class="lead">A cross-language development platform for in-memory data</p>
-    <p>
-        <a class="btn btn-lg btn-success" style="white-space: normal;" href="mailto:dev-subscribe@arrow.apache.org" role="button">Join Mailing List</a>
-        <a class="btn btn-lg btn-primary" style="white-space: normal;" href="{{ site.baseurl }}/install/" role="button">Install ({{site.data.versions['current'].number}} Release - {{site.data.versions['current'].date}})</a>
-    </p>
-</div>
-<h5>
-  Interested in contributing?
-  <small class="text-muted">Join the <a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/"><strong>mailing list</strong></a> or check out the <a href="https://cwiki.apache.org/confluence/display/ARROW"><strong>developer wiki</strong></a>.</small>
-</h5>
-<h5>
-  <a href="{{ site.baseurl }}/blog/"><strong>See Latest News</strong></a>
-</h5>
-<p>
-  {{ site.description }}
-</p>
-<hr />
+<h1>What is Arrow?</h1>
 <div class="row">
   <div class="col-lg-4">
-      <h2 class="mt-3">Fast</h2>
-      <p>Apache Arrow&#8482; enables execution engines to take advantage of the latest SIMD (Single instruction, multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing. Columnar layout is optimized for data locality for better performance on modern hardware like CPUs and GPUs.</p>
-      <p>The Arrow memory format supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <h2 class="mt-3">Format</h2>
+      <p>Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <p><a href="{{ site.baseurl }}/overview/">Learn more</a> about the design or
+        <a href="{{ site.baseurl }}/docs/format/Columnar.html">read the specification</a>.</p>
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Flexible</h2>
-      <p>Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust implementations are in progress and more languages are welcome.
+      <h2 class="mt-3">Libraries</h2>
+      <p>The Arrow project includes libraries that implement the memory specification in many languages. They enable you to use the Arrow format as an efficient means of sharing data across languages and processes. Libraries are available for <a href="{{ site.baseurl }}/docs/c_glib/">C</a>, <a href="{{ site.baseurl }}/docs/cpp/">C++</a>, <a href="https://github.com/apache/arrow/blob/master/csharp/README.md">C#</a>, <a href="https://godoc.org/github.com/apache/arrow/go/arrow">Go</a>, <a href="{{ site.baseurl }}/docs/java/">Java</a>, <a href="{{ site.baseurl }}/docs/js/">JavaScript</a>, <a href="https://github.com/apache/arrow/blob/master/matlab/README.md">MATLAB</a>, <a href="{{ site.baseurl }}/docs/python/">Python</a>, <a href="{{ site.baseurl }}/docs/r/">R</a>, <a href="https://github.com/apache/arrow/blob/master/ruby/README.md">Ruby</a>, and <a href="https://docs.rs/crate/arrow/">Rust</a>.
       </p>
+      See <a href="{{ site.baseurl }}/install/">how to install</a> and <a href="{{ site.baseurl }}/getting_started/">get started</a>.
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Standard</h2>
-      <p>Apache Arrow is backed by key developers of 13 major open source projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, Kudu, Pandas, Parquet, Phoenix, Spark, and Storm making it the de-facto standard for columnar in-memory analytics.</p>
-      <p>Learn more about projects that are <a href="{{ site.baseurl }}/powered_by/">Powered By Apache Arrow</a></p>
+      <h2 class="mt-3">Applications</h2>
+      <p>Arrow libraries provide a foundation for developers to build fast analytics applications. <a href="{{ site.baseurl }}/powered_by/">Many popular projects</a> use Arrow to ship columnar data efficiently or as the basis for analytic engines.
+      <p>The libraries also include built-in features for working with data directly, including Parquet file reading and querying large datasets. See more Arrow <a href="{{ site.baseurl }}/use_cases/">use cases</a>.</p>

Review comment:
       I did this and I think it works well. Will try to fill in the Overview page now, as best I can.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] pitrou commented on pull request #63: ARROW-9335: [Website] Update website for 1.0

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#issuecomment-654736619


   I'm a bit surprised by the very short home page but I suppose there was a reasoning behind that.
   Overall this looks a nice overhaul. In particular the top-level navigation bar is _much better_ than the old one.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] pitrou commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r438956370



##########
File path: faq.md
##########
@@ -24,32 +24,160 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Flight RPC), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable

Review comment:
       I'm just rephrasing the original contents, which were written by @wesm AFAIR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] wesm commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r445204740



##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only backwards-compatible changes, such as additional data types. We do not yet recommend the Arrow file format for long-term disk persistence of data; that said, it is perfectly acceptable to write Arrow memory to disk for purposes of memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing applications now, and choose a suitable file format for disk storage if necessary. The Arrow libraries include adapters for several file formats, including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl }}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to [private@arrow.apache.org](mailto:private@arrow.apache.org).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed for in-memory use, but you can put it on disk and then memory-map later. Arrow and Parquet are intended to be compatible with each other and used together in applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet file requires decompressing and decoding its contents into some kind of in-memory data structure. It is designed to be space/IO-efficient at the expensive CPU utilization for decoding. It does not provide any data structures for in-memory computing. Parquet is a streaming format which must be decoded from start-to-end; while some "index page" facilities have been added to the storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file format".
 
-Arrow on the other hand is first and foremost a library providing columnar data structures for *in-memory computing*. When you read a Parquet file, you can decompress and decode the data *into* Arrow columnar data structures so that you can then perform analytics in-memory on the decoded data. The Arrow columnar format has some nice properties: random access is O(1) and each value cell is next to the previous and following one in memory, so it's efficient to iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.
 
-What about "Arrow files" then? Apache Arrow defines a binary "serialization" protocol for arranging a collection of Arrow columnar arrays (called a "record batch") that can be used for messaging and interprocess communication. You can put the protocol anywhere, including on disk, which can later be memory-mapped or read into memory and sent elsewhere.
+Some things to keep in mind when comparing the Arrow IPC file format and the
+Parquet format:
 
-This Arrow protocol is designed so that you can "map" a blob of Arrow data without doing any deserialization, so performing analytics on Arrow protocol data on disk can use memory-mapping and pay effectively zero cost. The protocol is used for many other things as well, such as streaming data between Spark SQL and Python for running pandas functions against chunks of Spark SQL data (these are called "pandas udfs").
+* Parquet is safe for long-term storage and archival purposes, meaning if
+  you write a file today, you can expect that any system that says they can
+  "read Parquet" will be able to read the file in 5 years or 10 years.
+  We are not yet making this assertion about long-term stability of the Arrow
+  format.
 
-In some applications, Parquet and Arrow can be used interchangeably for on-disk data serialization. Some things to keep in mind:
+* Reading Parquet files generally requires expensive decoding, while reading
+  Arrow IPC files is just a matter of transferring raw bytes from the storage
+  hardware.

Review comment:
       Right. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] pitrou commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r438956713



##########
File path: faq.md
##########
@@ -24,32 +24,160 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Flight RPC), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+Some implementations of Arrow are more complete and more stable than others.
+We refer you to the [implementation matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
+
+## Getting involved
 
-The Arrow in-memory format is considered stable, and we intend to make only backwards-compatible changes, such as additional data types. We do not yet recommend the Arrow file format for long-term disk persistence of data; that said, it is perfectly acceptable to write Arrow memory to disk for purposes of memory mapping and caching.
+### I have some questions. How can I get help?
 
-We encourage people to start building Arrow-based in-memory computing applications now, and choose a suitable file format for disk storage if necessary. The Arrow libraries include adapters for several file formats, including Parquet, ORC, CSV, and JSON.
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl }}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to [private@arrow.apache.org](mailto:private@arrow.apache.org).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
 
-In short, Parquet files are designed for disk storage, while Arrow is designed for in-memory use, but you can put it on disk and then memory-map later. Arrow and Parquet are intended to be compatible with each other and used together in applications.
+Parquet files are designed for disk storage, while Arrow is designed for in-memory use,
+though you can put it on disk and then memory-map later. Arrow and Parquet are
+intended to be compatible with each other and used together in applications.
 
-Parquet is a columnar file format for data serialization. Reading a Parquet file requires decompressing and decoding its contents into some kind of in-memory data structure. It is designed to be space/IO-efficient at the expensive CPU utilization for decoding. It does not provide any data structures for in-memory computing. Parquet is a streaming format which must be decoded from start-to-end; while some "index page" facilities have been added to the storage format recently, random access operations are generally costly.
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on, and it must be decoded in
+large chunks.
 
-Arrow on the other hand is first and foremost a library providing columnar data structures for *in-memory computing*. When you read a Parquet file, you can decompress and decode the data *into* Arrow columnar data structures so that you can then perform analytics in-memory on the decoded data. The Arrow columnar format has some nice properties: random access is O(1) and each value cell is next to the previous and following one in memory, so it's efficient to iterate over.
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,

Review comment:
       This paragraph is about the in-memory format, though.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on a change in pull request #63: ARROW-9335: [Website] Update website for 1.0

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r456533214



##########
File path: _posts/2020-07-16-1.0.0-release.md
##########
@@ -0,0 +1,90 @@
+---
+layout: post
+title: "Apache Arrow 1.0.0 Release"
+date: "2020-07-16 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Apache Arrow team is pleased to announce the 1.0.0 release. This covers
+over 2 months of development work and includes [**XX resolved issues**][1]
+from [**XX distinct contributors**][2]. See the Install Page to learn how to
+get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## 1.0 Format Release
+
+The 1.0 major release indicates that the Arrow columnar format is declared
+stable, with [forward and backward compatibility guarantees][5].
+
+Integration testing, link to implementation matrix
+
+Format changes: unions, unsigned int dictionary indices, decimal bit width, feature enum.
+
+## Community
+
+Since the last release, we have added two new committers:
+
+* Liya Fan
+* Ji Liu
+
+Thank you for all your contributions!
+
+<!-- Acknowledge and link to any new committers and PMC members since the last release. See previous release announcements for examples. -->
+
+## Arrow Flight RPC notes
+
+## C++ notes
+
+## C# notes
+
+## Go notes
+
+## Java notes
+

Review comment:
       > For more on what’s in the 1.0.0 R package, see the [Java changelog][5].
   
   If there is a Java changelog, please add a link to it below and give it a higher number here (5 is already taken). Also change where this says "R package".
   
   If you do this as a "suggestion" comment, I can just merge it here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] fsaintjacques commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r439406667



##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only backwards-compatible changes, such as additional data types. We do not yet recommend the Arrow file format for long-term disk persistence of data; that said, it is perfectly acceptable to write Arrow memory to disk for purposes of memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing applications now, and choose a suitable file format for disk storage if necessary. The Arrow libraries include adapters for several file formats, including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl }}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to [private@arrow.apache.org](mailto:private@arrow.apache.org).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed for in-memory use, but you can put it on disk and then memory-map later. Arrow and Parquet are intended to be compatible with each other and used together in applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet file requires decompressing and decoding its contents into some kind of in-memory data structure. It is designed to be space/IO-efficient at the expensive CPU utilization for decoding. It does not provide any data structures for in-memory computing. Parquet is a streaming format which must be decoded from start-to-end; while some "index page" facilities have been added to the storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file format".
 
-Arrow on the other hand is first and foremost a library providing columnar data structures for *in-memory computing*. When you read a Parquet file, you can decompress and decode the data *into* Arrow columnar data structures so that you can then perform analytics in-memory on the decoded data. The Arrow columnar format has some nice properties: random access is O(1) and each value cell is next to the previous and following one in memory, so it's efficient to iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.

Review comment:
       ```suggestion
   IPC file can use memory-mapping, avoiding any deserialization cost and extra copies.
   ```
   
   You still need to copy from disk to memory.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r444533958



##########
File path: use_cases.md
##########
@@ -0,0 +1,98 @@
+---
+layout: default
+title: Use cases
+description: Example use cases for the Apache Arrow project
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Use Cases
+
+Here are some example applications of the Apache Arrow format and libraries.
+For more, see our [blog]({{ site.baseurl }}/blog/) and the list of projects
+[powered by Arrow]({{ site.baseurl }}/powered_by/).
+
+## Reading/writing columnar storage formats
+
+Many Arrow libraries provide convenient methods for reading and writing
+columnar file formats, including the Arrow IPC file format ("Feather")
+and the [Apache Parquet](https://parquet.apache.org/) format.
+
+<!-- Link to implementation matrix? -->
+
+* Feather: C++, [Python]({{ site.baseurl }}/docs/python/feather.html }}),
+  [R]({{ site.baseurl }}/docs/r/reference/read_feather.html)
+* Parquet: [C++]({{ site.baseurl }}/docs/cpp/parquet.html),
+  [Python]({{ site.baseurl }}/docs/python/parquet.html),
+  [R]({{ site.baseurl }}/docs/r/reference/read_parquet.html)
+
+In addition to single-file readers, some libraries (C++,
+[Python]({{ site.baseurl }}/docs/python/dataset.html),
+[R]({{ site.baseurl }}/docs/r/articles/dataset.html)) support reading
+entire directories of files and treating them as a single dataset. These
+datasets may be on the local file system or on a remote storage system, such
+as HDFS, S3, etc.
+
+## Sharing memory locally
+
+Arrow IPC files can be memory-mapped locally, which allow you to work with
+data bigger than memory and to share data across languages and processes.
+<!-- example? -->
+
+The Arrow project includes [Plasma]({% post_url 2017-08-08-plasma-in-memory-object-store %}),
+a shared-memory object store written in C++ and exposed in Python. Plasma
+holds immutable objects in shared memory so that they can be accessed
+efficiently by many clients across process boundaries.
+
+The Arrow format also defines a [C data interface]({% post_url 2020-05-04-introducing-arrow-c-data-interface %}),
+which allows zero-copy data sharing inside a single process without any
+build-time or link-time dependency requirements. This allows, for example,
+[R users to access `pyarrow`-based projects]({{ site.baseurl }}/docs/r/articles/python.html)
+using the `reticulate` package.
+
+## Moving data over the network
+
+The Arrow format allows serializing and shipping columnar data
+over the network - or any kind of streaming transport.
+[Apache Spark](https://spark.apache.org/) uses Arrow as a
+data interchange format, and both [PySpark]({% post_url 2017-07-26-spark-arrow %})
+and [sparklyr]({% post_url 2019-01-25-r-spark-improvements %}) can take
+advantage of Arrow for significant performance gains when transferring data.
+[Google BigQuery](https://cloud.google.com/bigquery/docs/reference/storage),
+[TensorFlow](https://www.tensorflow.org/tfx),
+[AWS Athena](https://docs.aws.amazon.com/athena/latest/ug/connect-to-a-data-source.html),
+and [others]({{ site.baseurl }}/powered_by/) also use Arrow similarly.
+
+The Arrow project also defines [Flight]({% post_url 2019-09-30-introducing-arrow-flight %}),
+a client-server RPC framework to build rich services exchanging data according
+to application-defined semantics.
+
+<!-- turbodbc -->
+
+## In-memory data structure for analytics
+
+The Arrow format is designed to enable fast computation. Some projects have
+begun to take advantage of that design.  Within the Apache Arrow project,
+[DataFusion]({% post_url 2019-02-04-datafusion-donation %}) is a query engine
+using Arrow data built in Rust.
+

Review comment:
       C++ compute is more potential than reality at this point I think. Either way, for that and for Gandiva, if we have working examples/blog posts we can link to that show them delivering value already, let's use them




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] pitrou commented on a change in pull request #63: ARROW-9335: [Website] Update website for 1.0

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r456640654



##########
File path: _posts/2020-07-16-1.0.0-release.md
##########
@@ -0,0 +1,211 @@
+---
+layout: post
+title: "Apache Arrow 1.0.0 Release"
+date: "2020-07-16 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Apache Arrow team is pleased to announce the 1.0.0 release. This covers
+over 2 months of development work and includes [**XX resolved issues**][1]
+from [**XX distinct contributors**][2]. See the Install Page to learn how to
+get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## 1.0 Format Release
+
+The Arrow format received several changes and additions, leading to the
+1.0 format version:
+
+* The metadata version was bumped to a new version V5, indicating an
+  incompatible change in the buffer layour of Union types (ARROW-9258).
+  All other types keep the same layout as in V4.
+
+* Dictionary indices are now allowed to be unsigned rather than signed
+  (ARROW-9259). Using UInt64 is still discouraged because of poor Java
+  support.
+
+* A "Feature" enum has been added to announce the use of specific optional
+  features in an IPC stream, such as buffer compression (ARROW-9308).  This
+  new field is not used by any implementation yet.
+
+* Optional buffer compression using LZ4 or ZStandard was added to the IPC
+  format (ARROW-300).
+
+* Decimal types now have an optional "bitWidth" field, defaulting to 128
+  (ARROW-8985).  This will allow for future support of other decimal widths
+  such as 32- and 64-bit.
+
+The 1.0 major release indicates that the Arrow columnar format is declared
+stable, with [forward and backward compatibility guarantees][5].
+
+Integration testing has been expanded to test for extension types and
+nested dictionaries.
+
+XXX Link to implementation matrix
+
+Format changes: unions, unsigned int dictionary indices, decimal bit width, feature enum.
+
+## Community
+
+Since the last release, we have added two new committers:
+
+* Liya Fan
+* Ji Liu
+
+Thank you for all your contributions!
+
+<!-- Acknowledge and link to any new committers and PMC members since the last release. See previous release announcements for examples. -->
+
+## Arrow Flight RPC notes
+
+Flight now offers DoExchange, a fully bidirectional data endpoint, in addition
+to DoGet and DoPut, in C++, Java, and Python. Middlewares in all languages now
+expose binary-valued headers. Additionally, servers and clients can set Arrow
+IPC read/write options in all languages, making compatibility easier with earlier
+versions of Arrow Flight.
+
+In C++ and Python, Flight now exposes more options from gRPC, including the
+address of the client (on the server) and the ability to set low-level gRPC
+client options. Flight also supports mutual TLS authentication and the ability
+for a client to control the size of a data message on the wire.
+
+## C++ notes
+
+Support for static linking with Arrow has been vastly improved, including the
+introduction of a `libarrow_bundled_dependencies.a` library bundling all
+required dependencies together (ARROW-7605).
+
+Following the Arrow format changes, Union arrays cannot have a top-level
+bitmap anymore (ARROW-9278; see also the IPC changes below).
+
+A number of improvements were made to reduce the overall code size in the
+Arrow library.
+
+A convenience API `GetBuildInfo` allows querying the characteristics of
+the Arrow library (ARROW-6521).  We encourage you to suggest any desired
+addition to the returned information.
+
+We added an optional dependency to the `utf8proc` library, used in several
+compute functions (ARROW-8961; see below).
+
+Instead of sharing the same concrete classes, sparse and dense unions now
+have separated classes (`SparseUnionType` and `DenseUnionType`, as well
+as `SparseUnionArray`, `DenseUnionArray`, `SparseUnionScalar`,
+`DenseUnionScalar`; ARROW-8866).
+
+Arrow can now be built for iOS using the right set of CMake options, though
+we don't officially support it (ARROW-8795).  See
+[this writeup](https://github.com/UnfoldedInc/deck.gl-native-dependencies/blob/master/docs/iOS-BUILD.md#arrow-v0170)
+for details.
+
+### Compute functions
+
+The compute kernel layer was extensively reworked (ARROW-8792).  It now offers
+a generic function lookup, dispatch and execution mechanism.  Furthermore,
+elaborate internal scaffoldings make it vastly easier to write new function
+kernels.
+
+Several compute functions have been added.  Unicode-compliant predicates and
+transforms, such as lowercase and uppercase transforms, are now available.
+
+The available compute functions are listed exhaustively in the Sphinx-generated
+documentation.
+
+### Datasets
+
+Datasets can now be read from CSV files (ARROW-7759).
+
+### Feather
+
+The Feather format is now available in version 2, which is simply the Arrow
+IPC file format with another name.
+
+### IPC
+
+By default, we now write IPC streams with metadata V5.  However, metadata V4
+can be requested by setting the appropriate member in `IpcWriteOptions`.
+
+V4 as well as V5 metadata IPC streams can be read properly, with one
+exception: a V4 metadata stream containing Union arrays with top-level
+null values will refuse reading.
+
+Support for dictionary replacement and dictionary delta was implemented
+(ARROW-7285).
+
+### Parquet
+
+Writing files with the LZ4 codec is disabled, because it produces files
+incompatible with the widely-used Hadoop Parquet implementation (ARROW-9424).
+Support will be reenabled once we align the LZ4 implementation with the
+special buffer encoding expected by Hadoop.
+
+## C# notes
+
+## Go notes
+
+## Java notes
+
+## JavaScript notes
+
+## Python notes
+
+The size of wheel packages is significantly reduced.  One side effect is
+that these wheels do not enable Gandiva anymore (ARROW-5082).
+
+The Scalar class hierarchy was reworked to more closely follow its C++
+counterpart (ARROW-9017).
+
+TLS CA certificates are looked up more reliably when using the S3 filesystem,
+especially with manylinux wheels (ARROW-9261).
+
+The encoding of CSV files can now be specified explicitly, defaulting to UTF8
+(ARROW-9106).  Custom timestamp parsers can now be used for CSV files
+(ARROW-8711).
+
+Filesystems can now be implemented in pure Python (ARROW-8766).  As a result,
+[fsspec](https://filesystem-spec.readthedocs.io)-based filesystems can now
+be used in datasets (ARROW-9383).
+
+## R notes
+
+The R package added support for converting to and from many additional Arrow types. Tables showing how R types are mapped to Arrow types and vice versa have been added to the [introductory vignette][6], and nearly all types are handled. In addition, R `attributes` like custom classes and metadata are now preserved when converting a `data.frame` to an Arrow Table and are restored when loading them back into R.

Review comment:
       Please, can text be properly word-wrapped at ~80 characters for easier review and editing?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r438947093



##########
File path: _layouts/home.html
##########
@@ -0,0 +1,21 @@
+{% include top.html %}
+
+<body class="wrap">
+  <header>
+    {% include header.html %}
+  </header>
+  <div class="big-arrow-bg">
+    <div class="container p-lg-4 centered">
+      <img src="{{ site.baseurl }}/img/arrow-inverse.png" style="max-width: 80%;"/>

Review comment:
       We can experiment; this also seems like the kind of thing to get a real designer's opinion on.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] pitrou commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r439424267



##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only backwards-compatible changes, such as additional data types. We do not yet recommend the Arrow file format for long-term disk persistence of data; that said, it is perfectly acceptable to write Arrow memory to disk for purposes of memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing applications now, and choose a suitable file format for disk storage if necessary. The Arrow libraries include adapters for several file formats, including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl }}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to [private@arrow.apache.org](mailto:private@arrow.apache.org).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed for in-memory use, but you can put it on disk and then memory-map later. Arrow and Parquet are intended to be compatible with each other and used together in applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet file requires decompressing and decoding its contents into some kind of in-memory data structure. It is designed to be space/IO-efficient at the expensive CPU utilization for decoding. It does not provide any data structures for in-memory computing. Parquet is a streaming format which must be decoded from start-to-end; while some "index page" facilities have been added to the storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file format".
 
-Arrow on the other hand is first and foremost a library providing columnar data structures for *in-memory computing*. When you read a Parquet file, you can decompress and decode the data *into* Arrow columnar data structures so that you can then perform analytics in-memory on the decoded data. The Arrow columnar format has some nice properties: random access is O(1) and each value cell is next to the previous and following one in memory, so it's efficient to iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.

Review comment:
       Agreed. This is a common misconception we sometimes seem to be vulnerable to.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson merged pull request #63: ARROW-9335: [Website] Update website for 1.0

Posted by GitBox <gi...@apache.org>.
nealrichardson merged pull request #63:
URL: https://github.com/apache/arrow-site/pull/63


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#issuecomment-648454225


   > @nealrichardson you shouldn't downplay your visual design skills too much, it's in any case already better than what we have now ;)
   
   Underpromise and overdeliver ;)
   
   > 
   > Some quick general comments from looking at the preview site (not necessarily related to the changes here though, might be issues on the existing site as well):
   > 
   > * The "Get Arrow" (dropdown) -> "Releases" page lists the changelogs for each release, but there is no link from here to the more human-readable release blogposts (while those post do link to the changelog). Maybe we could have both links in the bullet points?
   
   Good idea. I think this is autogenerated as part of the release process-- @kszucs you probably know for sure. Maybe we can after the fact add links to the release blog posts (where applicable since not all releases have blog posts) in such a way that wouldn't break any assumptions of the script that updates it when we do a release.
   
   > * The "Get Arrow" -> "Getting Started" and "Install" pages could maybe be combined? Or at least the difference in scope is not directly clear to me (eg for R it basically mentions the same)
   
   The Install page currently is a reference of release artifacts and package managers, many of which cut across languages, while Getting Started would offer user-oriented guides for for someone who says "I want to use Arrow in X language". There is overlap and maybe they could be consolidated, but I didn't see a clean way to do it.
   
   > * The "Use Cases" page is not yet linked from the menu?
   
   Yeah I noticed that too; do you have an opinion on where the link should go?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] pitrou commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r439424737



##########
File path: index.html
##########
@@ -1,72 +1,62 @@
 ---
-layout: default
+layout: home
 ---
-<div class="jumbotron">
-    <h1>Apache Arrow</h1>
-    <p class="lead">A cross-language development platform for in-memory data</p>
-    <p>
-        <a class="btn btn-lg btn-success" style="white-space: normal;" href="mailto:dev-subscribe@arrow.apache.org" role="button">Join Mailing List</a>
-        <a class="btn btn-lg btn-primary" style="white-space: normal;" href="{{ site.baseurl }}/install/" role="button">Install ({{site.data.versions['current'].number}} Release - {{site.data.versions['current'].date}})</a>
-    </p>
-</div>
-<h5>
-  Interested in contributing?
-  <small class="text-muted">Join the <a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/"><strong>mailing list</strong></a> or check out the <a href="https://cwiki.apache.org/confluence/display/ARROW"><strong>developer wiki</strong></a>.</small>
-</h5>
-<h5>
-  <a href="{{ site.baseurl }}/blog/"><strong>See Latest News</strong></a>
-</h5>
-<p>
-  {{ site.description }}
-</p>
-<hr />
+<h1>What is Arrow?</h1>
 <div class="row">
   <div class="col-lg-4">
-      <h2 class="mt-3">Fast</h2>
-      <p>Apache Arrow&#8482; enables execution engines to take advantage of the latest SIMD (Single instruction, multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing. Columnar layout is optimized for data locality for better performance on modern hardware like CPUs and GPUs.</p>
-      <p>The Arrow memory format supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <h2 class="mt-3">Format</h2>
+      <p>Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <p><a href="{{ site.baseurl }}/format/">Learn more</a> about the format or
+        <a href="{{ site.baseurl }}/docs/format/Columnar.html">read the specification</a>.</p>
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Flexible</h2>
-      <p>Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust implementations are in progress and more languages are welcome.
+      <h2 class="mt-3">Libraries</h2>
+      <p>The Arrow project includes libraries that implement the memory specification in many languages. They enable you to use the Arrow format as an efficient means of sharing data across languages and processes. Libraries are available for <a href="{{ site.baseurl }}/docs/c_glib/">C</a>, <a href="{{ site.baseurl }}/docs/cpp/">C++</a>, C#, Go, <a href="{{ site.baseurl }}/docs/java/">Java</a>, <a href="{{ site.baseurl }}/docs/js/">JavaScript</a>, MATLAB, <a href="{{ site.baseurl }}/docs/python/">Python</a>, <a href="{{ site.baseurl }}/docs/r/">R</a>, <a href="{{ site.baseurl }}/docs/ruby/">Ruby</a>, and Rust.

Review comment:
       Your suggestion is marked "outdated", not sure why.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on pull request #63: ARROW-9335: [Website] Update website for 1.0

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#issuecomment-654488304


   Ok all, I'm declaring myself done with this. There's certainly more that can be written, but this is complete enough. Please take a look, feel free to flesh out any sections you think are thin, comment/haggle over wording, etc., and I'll happily incorporate suggestions. See the PR description for a link to the preview site.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] kou commented on a change in pull request #63: ARROW-9335: [Website] Update website for 1.0

Posted by GitBox <gi...@apache.org>.
kou commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r457759084



##########
File path: .github/workflows/deploy.yml
##########
@@ -45,7 +45,7 @@ jobs:
       - name: Configure for GitHub Pages on master
         run: |
           owner=$(jq --raw-output .repository.owner.login ${GITHUB_EVENT_PATH})
-          repository=$(jq .repository.name ${GITHUB_EVENT_PATH})
+          repository=$(jq --raw-output .repository.name ${GITHUB_EVENT_PATH})

Review comment:
       Ah, I didn't know that we can use `github.repository.name` here.
   I'm OK to use `github.repository.name` instead of `jq`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on pull request #63: ARROW-9335: [Website] Update website for 1.0

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#issuecomment-663644877


   Ok, I've merged in the 1.0 release note and 1.0 docs and deployed to the test site, and then I've updated a few links and notes on the 1.0 announcement blog post. This is ready to go live.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r439487857



##########
File path: index.html
##########
@@ -1,72 +1,62 @@
 ---
-layout: default
+layout: home
 ---
-<div class="jumbotron">
-    <h1>Apache Arrow</h1>
-    <p class="lead">A cross-language development platform for in-memory data</p>
-    <p>
-        <a class="btn btn-lg btn-success" style="white-space: normal;" href="mailto:dev-subscribe@arrow.apache.org" role="button">Join Mailing List</a>
-        <a class="btn btn-lg btn-primary" style="white-space: normal;" href="{{ site.baseurl }}/install/" role="button">Install ({{site.data.versions['current'].number}} Release - {{site.data.versions['current'].date}})</a>
-    </p>
-</div>
-<h5>
-  Interested in contributing?
-  <small class="text-muted">Join the <a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/"><strong>mailing list</strong></a> or check out the <a href="https://cwiki.apache.org/confluence/display/ARROW"><strong>developer wiki</strong></a>.</small>
-</h5>
-<h5>
-  <a href="{{ site.baseurl }}/blog/"><strong>See Latest News</strong></a>
-</h5>
-<p>
-  {{ site.description }}
-</p>
-<hr />
+<h1>What is Arrow?</h1>
 <div class="row">
   <div class="col-lg-4">
-      <h2 class="mt-3">Fast</h2>
-      <p>Apache Arrow&#8482; enables execution engines to take advantage of the latest SIMD (Single instruction, multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing. Columnar layout is optimized for data locality for better performance on modern hardware like CPUs and GPUs.</p>
-      <p>The Arrow memory format supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <h2 class="mt-3">Format</h2>
+      <p>Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <p><a href="{{ site.baseurl }}/format/">Learn more</a> about the format or
+        <a href="{{ site.baseurl }}/docs/format/Columnar.html">read the specification</a>.</p>
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Flexible</h2>
-      <p>Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust implementations are in progress and more languages are welcome.
+      <h2 class="mt-3">Libraries</h2>
+      <p>The Arrow project includes libraries that implement the memory specification in many languages. They enable you to use the Arrow format as an efficient means of sharing data across languages and processes. Libraries are available for <a href="{{ site.baseurl }}/docs/c_glib/">C</a>, <a href="{{ site.baseurl }}/docs/cpp/">C++</a>, C#, Go, <a href="{{ site.baseurl }}/docs/java/">Java</a>, <a href="{{ site.baseurl }}/docs/js/">JavaScript</a>, MATLAB, <a href="{{ site.baseurl }}/docs/python/">Python</a>, <a href="{{ site.baseurl }}/docs/r/">R</a>, <a href="{{ site.baseurl }}/docs/ruby/">Ruby</a>, and Rust.

Review comment:
       Because I already pushed links to the docs, probably




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r438850902



##########
File path: index.html
##########
@@ -1,72 +1,58 @@
 ---
-layout: default
+layout: home
 ---
-<div class="jumbotron">
-    <h1>Apache Arrow</h1>
-    <p class="lead">A cross-language development platform for in-memory data</p>
-    <p>
-        <a class="btn btn-lg btn-success" style="white-space: normal;" href="mailto:dev-subscribe@arrow.apache.org" role="button">Join Mailing List</a>
-        <a class="btn btn-lg btn-primary" style="white-space: normal;" href="{{ site.baseurl }}/install/" role="button">Install ({{site.data.versions['current'].number}} Release - {{site.data.versions['current'].date}})</a>
-    </p>
-</div>
-<h5>
-  Interested in contributing?
-  <small class="text-muted">Join the <a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/"><strong>mailing list</strong></a> or check out the <a href="https://cwiki.apache.org/confluence/display/ARROW"><strong>developer wiki</strong></a>.</small>
-</h5>
-<h5>
-  <a href="{{ site.baseurl }}/blog/"><strong>See Latest News</strong></a>
-</h5>
-<p>
-  {{ site.description }}
-</p>
-<hr />
+<h1>What is Arrow?</h1>
 <div class="row">
   <div class="col-lg-4">
-      <h2 class="mt-3">Fast</h2>
-      <p>Apache Arrow&#8482; enables execution engines to take advantage of the latest SIMD (Single instruction, multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing. Columnar layout is optimized for data locality for better performance on modern hardware like CPUs and GPUs.</p>
-      <p>The Arrow memory format supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <h2 class="mt-3">Format</h2>
+      <p>Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <p><a href="{{ site.baseurl }}/format/">Learn more</a> about the format or
+        <a href="{{ site.baseurl }}/docs/format/Columnar.html">read the specification</a>.</p>
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Flexible</h2>
-      <p>Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust implementations are in progress and more languages are welcome.
+      <h2 class="mt-3">Libraries</h2>
+      <p>The Arrow project includes libraries that implement the memory specification in many languages. They enable you to use the Arrow format as an efficient means of sharing data across languages and processes. Libraries are available for <a href="{{ site.baseurl }}/docs/c_glib/">C</a>, <a href="{{ site.baseurl }}/docs/cpp/">C++</a>, C#, Go, <a href="{{ site.baseurl }}/docs/java/">Java</a>, <a href="{{ site.baseurl }}/docs/js/">JavaScript</a>, MATLAB, <a href="{{ site.baseurl }}/docs/python/">Python</a>, <a href="{{ site.baseurl }}/docs/r/">R</a>, <a href="{{ site.baseurl }}/docs/ruby/">Ruby</a>, and Rust.

Review comment:
       I linked the ones I saw under `/docs` on the website; will at least link to readmes for the others if there's no published documentation by release time.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nisit commented on pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
nisit commented on pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#issuecomment-642360231


   According to me, a little revamp from marketing perspective is needed. Words that can catch attention and provide quick understanding, should be used: e.g. Section headings for "What is Arrow?" can be reformatted as -- Engine, Language Support, Use Cases; Section heading under "Why Arrow" can be reformatted as: Lightening Fast, Wide Acceptance, Active Community Support/ Community Support 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] pitrou commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r439425122



##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only backwards-compatible changes, such as additional data types. We do not yet recommend the Arrow file format for long-term disk persistence of data; that said, it is perfectly acceptable to write Arrow memory to disk for purposes of memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing applications now, and choose a suitable file format for disk storage if necessary. The Arrow libraries include adapters for several file formats, including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl }}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to [private@arrow.apache.org](mailto:private@arrow.apache.org).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed for in-memory use, but you can put it on disk and then memory-map later. Arrow and Parquet are intended to be compatible with each other and used together in applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet file requires decompressing and decoding its contents into some kind of in-memory data structure. It is designed to be space/IO-efficient at the expensive CPU utilization for decoding. It does not provide any data structures for in-memory computing. Parquet is a streaming format which must be decoded from start-to-end; while some "index page" facilities have been added to the storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file format".
 
-Arrow on the other hand is first and foremost a library providing columnar data structures for *in-memory computing*. When you read a Parquet file, you can decompress and decode the data *into* Arrow columnar data structures so that you can then perform analytics in-memory on the decoded data. The Arrow columnar format has some nice properties: random access is O(1) and each value cell is next to the previous and following one in memory, so it's efficient to iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.

Review comment:
       Should add a comma before "avoiding", IMHO. Also, I don't think deserialization costs bear any relationship with memory-mapping.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] pitrou commented on pull request #63: ARROW-9335: [Website] Update website for 1.0

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#issuecomment-660303200


   I added some C++, Python and Format entries in the 1.0 post. I think Datasets need to be covered as well (@bkietz).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] jorisvandenbossche commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r444309775



##########
File path: _includes/header.html
##########
@@ -50,22 +33,44 @@
           </a>
           <div class="dropdown-menu" aria-labelledby="navbarDropdownDocumentation">
             <a class="dropdown-item" href="{{ site.baseurl }}/docs">Project Docs</a>
-            <a class="dropdown-item" href="{{ site.baseurl }}/docs/python">Python</a>
+            <a class="dropdown-item" href="{{ site.baseurl }}/docs/format/Columnar.html">Specification</a>
+            <hr/>
+            <a class="dropdown-item" href="{{ site.baseurl }}/docs/c_glib">C GLib</a>
             <a class="dropdown-item" href="{{ site.baseurl }}/docs/cpp">C++</a>
+            <a class="dropdown-item" href="https://github.com/apache/arrow/blob/master/csharp/README.md">C#</a>
+            <a class="dropdown-item" href="https://godoc.org/github.com/apache/arrow/go/arrow">Go</a>
             <a class="dropdown-item" href="{{ site.baseurl }}/docs/java">Java</a>
-            <a class="dropdown-item" href="{{ site.baseurl }}/docs/c_glib">C GLib</a>
             <a class="dropdown-item" href="{{ site.baseurl }}/docs/js">JavaScript</a>
+            <a class="dropdown-item" href="https://github.com/apache/arrow/blob/master/matlab/README.md">MATLAB</a>
+            <a class="dropdown-item" href="{{ site.baseurl }}/docs/python">Python</a>
             <a class="dropdown-item" href="{{ site.baseurl }}/docs/r">R</a>
+            <a class="dropdown-item" href="https://github.com/apache/arrow/blob/master/ruby/README.md">Ruby</a>
+            <a class="dropdown-item" href="https://docs.rs/crate/arrow/">Rust</a>
+          </div>
+        </li>
+        <li class="nav-item dropdown">
+          <a class="nav-link dropdown-toggle" href="#"
+             id="navbarDropdownCommunity" role="button" data-toggle="dropdown"
+             aria-haspopup="true" aria-expanded="false">
+             Community
+          </a>
+          <div class="dropdown-menu" aria-labelledby="navbarDropdownCommunity">
+            <a class="dropdown-item" href="{{ site.baseurl }}/community/">Mailing Lists</a>

Review comment:
       Should we call this differently than "Mailing Lists", since it's about more than only that?

##########
File path: use_cases.md
##########
@@ -0,0 +1,98 @@
+---
+layout: default
+title: Use cases
+description: Example use cases for the Apache Arrow project
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Use Cases
+
+Here are some example applications of the Apache Arrow format and libraries.
+For more, see our [blog]({{ site.baseurl }}/blog/) and the list of projects
+[powered by Arrow]({{ site.baseurl }}/powered_by/).
+
+## Reading/writing columnar storage formats
+
+Many Arrow libraries provide convenient methods for reading and writing
+columnar file formats, including the Arrow IPC file format ("Feather")
+and the [Apache Parquet](https://parquet.apache.org/) format.
+
+<!-- Link to implementation matrix? -->
+
+* Feather: C++, [Python]({{ site.baseurl }}/docs/python/feather.html }}),
+  [R]({{ site.baseurl }}/docs/r/reference/read_feather.html)
+* Parquet: [C++]({{ site.baseurl }}/docs/cpp/parquet.html),
+  [Python]({{ site.baseurl }}/docs/python/parquet.html),
+  [R]({{ site.baseurl }}/docs/r/reference/read_parquet.html)
+
+In addition to single-file readers, some libraries (C++,
+[Python]({{ site.baseurl }}/docs/python/dataset.html),
+[R]({{ site.baseurl }}/docs/r/articles/dataset.html)) support reading
+entire directories of files and treating them as a single dataset. These
+datasets may be on the local file system or on a remote storage system, such
+as HDFS, S3, etc.
+
+## Sharing memory locally
+
+Arrow IPC files can be memory-mapped locally, which allow you to work with
+data bigger than memory and to share data across languages and processes.
+<!-- example? -->
+
+The Arrow project includes [Plasma]({% post_url 2017-08-08-plasma-in-memory-object-store %}),
+a shared-memory object store written in C++ and exposed in Python. Plasma
+holds immutable objects in shared memory so that they can be accessed
+efficiently by many clients across process boundaries.
+
+The Arrow format also defines a [C data interface]({% post_url 2020-05-04-introducing-arrow-c-data-interface %}),
+which allows zero-copy data sharing inside a single process without any
+build-time or link-time dependency requirements. This allows, for example,
+[R users to access `pyarrow`-based projects]({{ site.baseurl }}/docs/r/articles/python.html)
+using the `reticulate` package.
+
+## Moving data over the network
+
+The Arrow format allows serializing and shipping columnar data
+over the network - or any kind of streaming transport.
+[Apache Spark](https://spark.apache.org/) uses Arrow as a
+data interchange format, and both [PySpark]({% post_url 2017-07-26-spark-arrow %})
+and [sparklyr]({% post_url 2019-01-25-r-spark-improvements %}) can take
+advantage of Arrow for significant performance gains when transferring data.
+[Google BigQuery](https://cloud.google.com/bigquery/docs/reference/storage),
+[TensorFlow](https://www.tensorflow.org/tfx),
+[AWS Athena](https://docs.aws.amazon.com/athena/latest/ug/connect-to-a-data-source.html),
+and [others]({{ site.baseurl }}/powered_by/) also use Arrow similarly.
+
+The Arrow project also defines [Flight]({% post_url 2019-09-30-introducing-arrow-flight %}),
+a client-server RPC framework to build rich services exchanging data according
+to application-defined semantics.
+
+<!-- turbodbc -->
+
+## In-memory data structure for analytics
+
+The Arrow format is designed to enable fast computation. Some projects have
+begun to take advantage of that design.  Within the Apache Arrow project,
+[DataFusion]({% post_url 2019-02-04-datafusion-donation %}) is a query engine
+using Arrow data built in Rust.
+

Review comment:
       Others things we could mention here: Gandiva, and the compute functionality of C++




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r438952798



##########
File path: format.html
##########
@@ -0,0 +1,55 @@
+---
+layout: default
+title: Format
+description: Arrow Format
+---
+
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+
+<div class="row">
+  <div class="col-md-6">
+    <h2>Performance Advantage of Columnar In-Memory</h2>

Review comment:
       It probably doesn't (though this file would have to be renamed to .md to be rendered). There are some things the markdown renderer can't do right, but this would probably be fine. This was just copied from the original homepage and was already html.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r445201518



##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only backwards-compatible changes, such as additional data types. We do not yet recommend the Arrow file format for long-term disk persistence of data; that said, it is perfectly acceptable to write Arrow memory to disk for purposes of memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing applications now, and choose a suitable file format for disk storage if necessary. The Arrow libraries include adapters for several file formats, including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl }}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to [private@arrow.apache.org](mailto:private@arrow.apache.org).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed for in-memory use, but you can put it on disk and then memory-map later. Arrow and Parquet are intended to be compatible with each other and used together in applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet file requires decompressing and decoding its contents into some kind of in-memory data structure. It is designed to be space/IO-efficient at the expensive CPU utilization for decoding. It does not provide any data structures for in-memory computing. Parquet is a streaming format which must be decoded from start-to-end; while some "index page" facilities have been added to the storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file format".
 
-Arrow on the other hand is first and foremost a library providing columnar data structures for *in-memory computing*. When you read a Parquet file, you can decompress and decode the data *into* Arrow columnar data structures so that you can then perform analytics in-memory on the decoded data. The Arrow columnar format has some nice properties: random access is O(1) and each value cell is next to the previous and following one in memory, so it's efficient to iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.
 
-What about "Arrow files" then? Apache Arrow defines a binary "serialization" protocol for arranging a collection of Arrow columnar arrays (called a "record batch") that can be used for messaging and interprocess communication. You can put the protocol anywhere, including on disk, which can later be memory-mapped or read into memory and sent elsewhere.
+Some things to keep in mind when comparing the Arrow IPC file format and the
+Parquet format:
 
-This Arrow protocol is designed so that you can "map" a blob of Arrow data without doing any deserialization, so performing analytics on Arrow protocol data on disk can use memory-mapping and pay effectively zero cost. The protocol is used for many other things as well, such as streaming data between Spark SQL and Python for running pandas functions against chunks of Spark SQL data (these are called "pandas udfs").
+* Parquet is safe for long-term storage and archival purposes, meaning if
+  you write a file today, you can expect that any system that says they can
+  "read Parquet" will be able to read the file in 5 years or 10 years.
+  We are not yet making this assertion about long-term stability of the Arrow
+  format.

Review comment:
       Why not?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on pull request #63: ARROW-9335: [Website] Update website for 1.0

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#issuecomment-654923863


   > I'm a bit surprised by the very short home page but I suppose there was a reasoning behind that.
   
   Lack of content and/or inspiration, mainly. In the latest revision, I tried something along the lines of https://kudu.apache.org/, with a short home page plus a longer overview page. But in the end, the overview page I ended up with wasn't that long. So we could fold that back into the home page and delete the overview page for now. I think there's value in a page that explains the origin and value proposition of Arrow, and that that (full) discussion doesn't belong on the home page. But if we aren't able right now to do that page properly, maybe we should cut it and fold some of the content back to the home page.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] pitrou commented on pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#issuecomment-645932505


   Perhaps the Getting started and Install pages should be per-implementation, rather than global. The current draft for the Getting started page is just a bunch of links to other parts of the documentation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r445204028



##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only backwards-compatible changes, such as additional data types. We do not yet recommend the Arrow file format for long-term disk persistence of data; that said, it is perfectly acceptable to write Arrow memory to disk for purposes of memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing applications now, and choose a suitable file format for disk storage if necessary. The Arrow libraries include adapters for several file formats, including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl }}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to [private@arrow.apache.org](mailto:private@arrow.apache.org).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed for in-memory use, but you can put it on disk and then memory-map later. Arrow and Parquet are intended to be compatible with each other and used together in applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet file requires decompressing and decoding its contents into some kind of in-memory data structure. It is designed to be space/IO-efficient at the expensive CPU utilization for decoding. It does not provide any data structures for in-memory computing. Parquet is a streaming format which must be decoded from start-to-end; while some "index page" facilities have been added to the storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file format".
 
-Arrow on the other hand is first and foremost a library providing columnar data structures for *in-memory computing*. When you read a Parquet file, you can decompress and decode the data *into* Arrow columnar data structures so that you can then perform analytics in-memory on the decoded data. The Arrow columnar format has some nice properties: random access is O(1) and each value cell is next to the previous and following one in memory, so it's efficient to iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.
 
-What about "Arrow files" then? Apache Arrow defines a binary "serialization" protocol for arranging a collection of Arrow columnar arrays (called a "record batch") that can be used for messaging and interprocess communication. You can put the protocol anywhere, including on disk, which can later be memory-mapped or read into memory and sent elsewhere.
+Some things to keep in mind when comparing the Arrow IPC file format and the
+Parquet format:
 
-This Arrow protocol is designed so that you can "map" a blob of Arrow data without doing any deserialization, so performing analytics on Arrow protocol data on disk can use memory-mapping and pay effectively zero cost. The protocol is used for many other things as well, such as streaming data between Spark SQL and Python for running pandas functions against chunks of Spark SQL data (these are called "pandas udfs").
+* Parquet is safe for long-term storage and archival purposes, meaning if
+  you write a file today, you can expect that any system that says they can
+  "read Parquet" will be able to read the file in 5 years or 10 years.
+  We are not yet making this assertion about long-term stability of the Arrow
+  format.
 
-In some applications, Parquet and Arrow can be used interchangeably for on-disk data serialization. Some things to keep in mind:
+* Reading Parquet files generally requires expensive decoding, while reading
+  Arrow IPC files is just a matter of transferring raw bytes from the storage
+  hardware.

Review comment:
       ... except when the columns are compressed?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] robert-wagner commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
robert-wagner commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r438790485



##########
File path: index.html
##########
@@ -1,72 +1,58 @@
 ---
-layout: default
+layout: home
 ---
-<div class="jumbotron">
-    <h1>Apache Arrow</h1>
-    <p class="lead">A cross-language development platform for in-memory data</p>
-    <p>
-        <a class="btn btn-lg btn-success" style="white-space: normal;" href="mailto:dev-subscribe@arrow.apache.org" role="button">Join Mailing List</a>
-        <a class="btn btn-lg btn-primary" style="white-space: normal;" href="{{ site.baseurl }}/install/" role="button">Install ({{site.data.versions['current'].number}} Release - {{site.data.versions['current'].date}})</a>
-    </p>
-</div>
-<h5>
-  Interested in contributing?
-  <small class="text-muted">Join the <a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/"><strong>mailing list</strong></a> or check out the <a href="https://cwiki.apache.org/confluence/display/ARROW"><strong>developer wiki</strong></a>.</small>
-</h5>
-<h5>
-  <a href="{{ site.baseurl }}/blog/"><strong>See Latest News</strong></a>
-</h5>
-<p>
-  {{ site.description }}
-</p>
-<hr />
+<h1>What is Arrow?</h1>
 <div class="row">
   <div class="col-lg-4">
-      <h2 class="mt-3">Fast</h2>
-      <p>Apache Arrow&#8482; enables execution engines to take advantage of the latest SIMD (Single instruction, multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing. Columnar layout is optimized for data locality for better performance on modern hardware like CPUs and GPUs.</p>
-      <p>The Arrow memory format supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <h2 class="mt-3">Format</h2>
+      <p>Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <p><a href="{{ site.baseurl }}/format/">Learn more</a> about the format or
+        <a href="{{ site.baseurl }}/docs/format/Columnar.html">read the specification</a>.</p>
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Flexible</h2>
-      <p>Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust implementations are in progress and more languages are welcome.
+      <h2 class="mt-3">Libraries</h2>
+      <p>The Arrow project includes libraries that implement the memory specification in many languages. They enable you to use the Arrow format as an efficient means of sharing data across languages and processes. Libraries are available for <a href="{{ site.baseurl }}/docs/c_glib/">C</a>, <a href="{{ site.baseurl }}/docs/cpp/">C++</a>, C#, Go, <a href="{{ site.baseurl }}/docs/java/">Java</a>, <a href="{{ site.baseurl }}/docs/js/">JavaScript</a>, MATLAB, <a href="{{ site.baseurl }}/docs/python/">Python</a>, <a href="{{ site.baseurl }}/docs/r/">R</a>, <a href="{{ site.baseurl }}/docs/ruby/">Ruby</a>, and Rust.

Review comment:
       Is there Rust/MATLAB/C#/Go documentation or is that something still needs to be made external to this pr




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] lidavidm commented on a change in pull request #63: ARROW-9335: [Website] Update website for 1.0

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r456432659



##########
File path: _posts/2020-07-16-1.0.0-release.md
##########
@@ -0,0 +1,90 @@
+---
+layout: post
+title: "Apache Arrow 1.0.0 Release"
+date: "2020-07-16 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Apache Arrow team is pleased to announce the 1.0.0 release. This covers
+over 2 months of development work and includes [**XX resolved issues**][1]
+from [**XX distinct contributors**][2]. See the Install Page to learn how to
+get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## 1.0 Format Release
+
+The 1.0 major release indicates that the Arrow columnar format is declared
+stable, with [forward and backward compatibility guarantees][5].
+
+Integration testing, link to implementation matrix
+
+Format changes: unions, unsigned int dictionary indices, decimal bit width, feature enum.
+
+## Community
+
+Since the last release, we have added two new committers:
+
+* Liya Fan
+* Ji Liu
+
+Thank you for all your contributions!
+
+<!-- Acknowledge and link to any new committers and PMC members since the last release. See previous release announcements for examples. -->
+
+## Arrow Flight RPC notes
+

Review comment:
       ```suggestion
   Flight now offers DoExchange, a fully bidirectional data endpoint, in addition to DoGet and DoPut, in C++, Java, and Python. Middleware in all languages now expose binary-valued headers. Additionally, servers and clients can set Arrow IPC read/write options in all languages, making compatibility easier with earlier versions of Arrow Flight.
   
   In C++ and Python, Flight now exposes more options from gRPC, including the address of the client (on the server) and the ability to set low-level gRPC client options. Flight also supports mTLS for authentication and the ability for a client to control the size of a data message on the wire.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] rymurr commented on a change in pull request #63: ARROW-9335: [Website] Update website for 1.0

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r456430904



##########
File path: _posts/2020-07-16-1.0.0-release.md
##########
@@ -0,0 +1,90 @@
+---
+layout: post
+title: "Apache Arrow 1.0.0 Release"
+date: "2020-07-16 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Apache Arrow team is pleased to announce the 1.0.0 release. This covers
+over 2 months of development work and includes [**XX resolved issues**][1]
+from [**XX distinct contributors**][2]. See the Install Page to learn how to
+get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## 1.0 Format Release
+
+The 1.0 major release indicates that the Arrow columnar format is declared
+stable, with [forward and backward compatibility guarantees][5].
+
+Integration testing, link to implementation matrix
+
+Format changes: unions, unsigned int dictionary indices, decimal bit width, feature enum.
+
+## Community
+
+Since the last release, we have added two new committers:
+
+* Liya Fan
+* Ji Liu
+
+Thank you for all your contributions!
+
+<!-- Acknowledge and link to any new committers and PMC members since the last release. See previous release announcements for examples. -->
+
+## Arrow Flight RPC notes
+
+## C++ notes
+
+## C# notes
+
+## Go notes
+
+## Java notes
+

Review comment:
       The Java package introduces a number of low level changes in this release. Most notable are the work in support of allocating large arrow buffers and removing Netty from the public API. Users will have to update their dependencies to use one of the two supported allocators Netty: `arrow-memory-netty` or Unsafe (internal java api for direct memory) `arrow-memory-unsafe`.
   
   The Java Vector implementation has improved its interoperability having verified `LargeVarChar`, `LargeBinary`, `LargeList`, `Union`, Extension types and duplicate field names in `Structs` are binary compatible with C++ and the specification.
   
   For more on what’s in the 1.0.0 R package, see the [Java changelog][5].




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] fsaintjacques commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r439406667



##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only backwards-compatible changes, such as additional data types. We do not yet recommend the Arrow file format for long-term disk persistence of data; that said, it is perfectly acceptable to write Arrow memory to disk for purposes of memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing applications now, and choose a suitable file format for disk storage if necessary. The Arrow libraries include adapters for several file formats, including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl }}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to [private@arrow.apache.org](mailto:private@arrow.apache.org).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed for in-memory use, but you can put it on disk and then memory-map later. Arrow and Parquet are intended to be compatible with each other and used together in applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet file requires decompressing and decoding its contents into some kind of in-memory data structure. It is designed to be space/IO-efficient at the expensive CPU utilization for decoding. It does not provide any data structures for in-memory computing. Parquet is a streaming format which must be decoded from start-to-end; while some "index page" facilities have been added to the storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file format".
 
-Arrow on the other hand is first and foremost a library providing columnar data structures for *in-memory computing*. When you read a Parquet file, you can decompress and decode the data *into* Arrow columnar data structures so that you can then perform analytics in-memory on the decoded data. The Arrow columnar format has some nice properties: random access is O(1) and each value cell is next to the previous and following one in memory, so it's efficient to iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.

Review comment:
       ```suggestion
   IPC file can use memory-mapping avoiding any deserialization cost and extra copies.
   ```
   
   You still need to copy from disk to memory.

##########
File path: .github/workflows/deploy.yml
##########
@@ -36,16 +36,16 @@ jobs:
       - name: Configure for production
         run: |
           echo "BASE_URL=" >> ../env.sh
-          echo "ORIGIN=$(jq --raw-output .repository.full_name ${GITHUB_EVENT_PATH})" >> ../env.sh
+          echo "ORIGIN=${{ github.repository }}" >> ../env.sh
           echo "TARGET_BRANCH=asf-site" >> ../env.sh
           echo >> _extra_config.yml
         if: |
           github.event_name == 'push' &&
             github.repository == 'apache/arrow-site'
       - name: Configure for GitHub Pages on master
         run: |
-          owner=$(jq .repository.owner.login ${GITHUB_EVENT_PATH})
-          repository=$(jq .repository.name ${GITHUB_EVENT_PATH})
+          owner=$(jq .repository.owner.login ${GITHUB_EVENT_PATH} | tr -d '"')

Review comment:
       `jq -r` will remove the quotes.

##########
File path: index.html
##########
@@ -1,72 +1,62 @@
 ---
-layout: default
+layout: home
 ---
-<div class="jumbotron">
-    <h1>Apache Arrow</h1>
-    <p class="lead">A cross-language development platform for in-memory data</p>
-    <p>
-        <a class="btn btn-lg btn-success" style="white-space: normal;" href="mailto:dev-subscribe@arrow.apache.org" role="button">Join Mailing List</a>
-        <a class="btn btn-lg btn-primary" style="white-space: normal;" href="{{ site.baseurl }}/install/" role="button">Install ({{site.data.versions['current'].number}} Release - {{site.data.versions['current'].date}})</a>
-    </p>
-</div>
-<h5>
-  Interested in contributing?
-  <small class="text-muted">Join the <a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/"><strong>mailing list</strong></a> or check out the <a href="https://cwiki.apache.org/confluence/display/ARROW"><strong>developer wiki</strong></a>.</small>
-</h5>
-<h5>
-  <a href="{{ site.baseurl }}/blog/"><strong>See Latest News</strong></a>
-</h5>
-<p>
-  {{ site.description }}
-</p>
-<hr />
+<h1>What is Arrow?</h1>
 <div class="row">
   <div class="col-lg-4">
-      <h2 class="mt-3">Fast</h2>
-      <p>Apache Arrow&#8482; enables execution engines to take advantage of the latest SIMD (Single instruction, multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing. Columnar layout is optimized for data locality for better performance on modern hardware like CPUs and GPUs.</p>
-      <p>The Arrow memory format supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <h2 class="mt-3">Format</h2>
+      <p>Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <p><a href="{{ site.baseurl }}/format/">Learn more</a> about the format or
+        <a href="{{ site.baseurl }}/docs/format/Columnar.html">read the specification</a>.</p>
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Flexible</h2>
-      <p>Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust implementations are in progress and more languages are welcome.
+      <h2 class="mt-3">Libraries</h2>
+      <p>The Arrow project includes libraries that implement the memory specification in many languages. They enable you to use the Arrow format as an efficient means of sharing data across languages and processes. Libraries are available for <a href="{{ site.baseurl }}/docs/c_glib/">C</a>, <a href="{{ site.baseurl }}/docs/cpp/">C++</a>, C#, Go, <a href="{{ site.baseurl }}/docs/java/">Java</a>, <a href="{{ site.baseurl }}/docs/js/">JavaScript</a>, MATLAB, <a href="{{ site.baseurl }}/docs/python/">Python</a>, <a href="{{ site.baseurl }}/docs/r/">R</a>, <a href="{{ site.baseurl }}/docs/ruby/">Ruby</a>, and Rust.

Review comment:
       ```suggestion
         <p>The Arrow project includes libraries that implement the memory specification in many languages. They enable you to use the Arrow format as an efficient means of sharing data across languages and processes. Libraries are available for <a href="{{ site.baseurl }}/docs/c_glib/">C</a>, <a href="{{ site.baseurl }}/docs/cpp/">C++</a>, C#, <a href="https://godoc.org/github.com/apache/arrow/go/arrow">Go</a>, <a href="{{ site.baseurl }}/docs/java/">Java</a>, <a href="{{ site.baseurl }}/docs/js/">JavaScript</a>, MATLAB, <a href="{{ site.baseurl }}/docs/python/">Python</a>, <a href="{{ site.baseurl }}/docs/r/">R</a>, <a href="{{ site.baseurl }}/docs/ruby/">Ruby</a>, and Rust.
   ```
   
   Go packages provides automatic documentation, see [arrow](https://godoc.org/github.com/apache/arrow/go/arrow).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] rymurr commented on pull request #63: ARROW-9335: [Website] Update website for 1.0

Posted by GitBox <gi...@apache.org>.
rymurr commented on pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#issuecomment-660098570


   Hey @nealrichardson I have added a first cut of the Java changes as a comment to the blog. I couldn't figure out how to add directly to the blog. Hopefully that helps as a first pass and the Java commiters can refine


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] pitrou commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r438956548



##########
File path: faq.md
##########
@@ -24,32 +24,160 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Flight RPC), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+Some implementations of Arrow are more complete and more stable than others.
+We refer you to the [implementation matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).

Review comment:
       Ah... annoying.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on a change in pull request #63: ARROW-9335: [Website] Update website for 1.0

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r458325181



##########
File path: .github/workflows/deploy.yml
##########
@@ -45,7 +45,7 @@ jobs:
       - name: Configure for GitHub Pages on master
         run: |
           owner=$(jq --raw-output .repository.owner.login ${GITHUB_EVENT_PATH})
-          repository=$(jq .repository.name ${GITHUB_EVENT_PATH})
+          repository=$(jq --raw-output .repository.name ${GITHUB_EVENT_PATH})

Review comment:
       I'm not sure that `github.repository.name` is a thing. `github.repository` is a string, per https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#github-context




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] rymurr commented on a change in pull request #63: ARROW-9335: [Website] Update website for 1.0

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r456430904



##########
File path: _posts/2020-07-16-1.0.0-release.md
##########
@@ -0,0 +1,90 @@
+---
+layout: post
+title: "Apache Arrow 1.0.0 Release"
+date: "2020-07-16 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Apache Arrow team is pleased to announce the 1.0.0 release. This covers
+over 2 months of development work and includes [**XX resolved issues**][1]
+from [**XX distinct contributors**][2]. See the Install Page to learn how to
+get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## 1.0 Format Release
+
+The 1.0 major release indicates that the Arrow columnar format is declared
+stable, with [forward and backward compatibility guarantees][5].
+
+Integration testing, link to implementation matrix
+
+Format changes: unions, unsigned int dictionary indices, decimal bit width, feature enum.
+
+## Community
+
+Since the last release, we have added two new committers:
+
+* Liya Fan
+* Ji Liu
+
+Thank you for all your contributions!
+
+<!-- Acknowledge and link to any new committers and PMC members since the last release. See previous release announcements for examples. -->
+
+## Arrow Flight RPC notes
+
+## C++ notes
+
+## C# notes
+
+## Go notes
+
+## Java notes
+

Review comment:
       ```suggestion
   The Java package introduces a number of low level changes in this release. Most notable are the work in support of allocating large arrow buffers and removing Netty from the public API. Users will have to update their dependencies to use one of the two supported allocators Netty: `arrow-memory-netty` or Unsafe (internal java api for direct memory) `arrow-memory-unsafe`.
   
   The Java Vector implementation has improved its interoperability having verified `LargeVarChar`, `LargeBinary`, `LargeList`, `Union`, Extension types and duplicate field names in `Structs` are binary compatible with C++ and the specification.
   ``` 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] jorisvandenbossche commented on pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#issuecomment-648241609


   @nealrichardson you shouldn't downplay your visual design skills too much, it's in any case already better than what we have now ;)
   
   Some quick general comments from looking at the preview site (not necessarily related to the changes here though, might be issues on the existing site as well):
   
   * The "Get Arrow" (dropdown) -> "Releases" page lists the changelogs for each release, but there is no link from here to the more human-readable release blogposts (while those post do link to the changelog). Maybe we could have both links in the bullet points?
   * The "Get Arrow" -> "Getting Started" and "Install" pages could maybe be combined? Or at least the difference in scope is not directly clear to me (eg for R it basically mentions the same)
   * The "Use Cases" page is not yet linked from the menu?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] wesm commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r445204629



##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only backwards-compatible changes, such as additional data types. We do not yet recommend the Arrow file format for long-term disk persistence of data; that said, it is perfectly acceptable to write Arrow memory to disk for purposes of memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing applications now, and choose a suitable file format for disk storage if necessary. The Arrow libraries include adapters for several file formats, including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl }}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to [private@arrow.apache.org](mailto:private@arrow.apache.org).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed for in-memory use, but you can put it on disk and then memory-map later. Arrow and Parquet are intended to be compatible with each other and used together in applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet file requires decompressing and decoding its contents into some kind of in-memory data structure. It is designed to be space/IO-efficient at the expensive CPU utilization for decoding. It does not provide any data structures for in-memory computing. Parquet is a streaming format which must be decoded from start-to-end; while some "index page" facilities have been added to the storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file format".
 
-Arrow on the other hand is first and foremost a library providing columnar data structures for *in-memory computing*. When you read a Parquet file, you can decompress and decode the data *into* Arrow columnar data structures so that you can then perform analytics in-memory on the decoded data. The Arrow columnar format has some nice properties: random access is O(1) and each value cell is next to the previous and following one in memory, so it's efficient to iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.
 
-What about "Arrow files" then? Apache Arrow defines a binary "serialization" protocol for arranging a collection of Arrow columnar arrays (called a "record batch") that can be used for messaging and interprocess communication. You can put the protocol anywhere, including on disk, which can later be memory-mapped or read into memory and sent elsewhere.
+Some things to keep in mind when comparing the Arrow IPC file format and the
+Parquet format:
 
-This Arrow protocol is designed so that you can "map" a blob of Arrow data without doing any deserialization, so performing analytics on Arrow protocol data on disk can use memory-mapping and pay effectively zero cost. The protocol is used for many other things as well, such as streaming data between Spark SQL and Python for running pandas functions against chunks of Spark SQL data (these are called "pandas udfs").
+* Parquet is safe for long-term storage and archival purposes, meaning if
+  you write a file today, you can expect that any system that says they can
+  "read Parquet" will be able to read the file in 5 years or 10 years.
+  We are not yet making this assertion about long-term stability of the Arrow
+  format.

Review comment:
       Projects that have "long-term archival storage" as a goal prioritize different things, like data integrity issues that occur when hard drives decay, or trying to reduce the on-disk footprint of data, and so forth. Maybe instead of "not intended" it's rather "does not prioritize the requirements of long-term archival storage." 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] jorisvandenbossche commented on pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#issuecomment-648783201


   > > The "Use Cases" page is not yet linked from the menu?
   >
   > Yeah I noticed that too; do you have an opinion on where the link should go?
   
   I would give it a prominent place, as I think this is important for people actually understanding what Arrow is about or can do/enable. 
   I would even say that this (or some highlights of it, linking to the full page) could fit on the home page (only not directly sure in what form).
   
   What is the idea for the "Overview" page? It could maybe also fir there?
   
   > The Install page currently is a reference of release artifacts and package managers, many of which cut across languages, while Getting Started would offer user-oriented guides for for someone who says "I want to use Arrow in X language". 
   
   Ah, sorry, I missed that (although it's clearly stated, I see now) that the install page is a reference of the release artifacts. We need this for Apache in this form? But it should indeed not be the first entry point for getting started (eg for Python conda packages are not official release artifacts.. which gives a strange order on the page)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nisit commented on pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
nisit commented on pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#issuecomment-642345992


   A cross-language development platform for in-memory analytics . . .Where is the stress? Is it on in memory analytics or on cross platform.
   
   My suggestion would be : "An in-memory analytics development platform covering/spanning cross-language/s" 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] pitrou commented on pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#issuecomment-649640382


   We should be accessible to people who read this the first time. "Columnar IPC protocol" is a bit puzzling. I think it's easier to understand if we explain that there is an in-memory format, and there's an IPC protocol based on the in-memory format (with an additional transport and metadata layer).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] kou commented on a change in pull request #63: ARROW-9335: [Website] Update website for 1.0

Posted by GitBox <gi...@apache.org>.
kou commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r458396149



##########
File path: .github/workflows/deploy.yml
##########
@@ -45,7 +45,7 @@ jobs:
       - name: Configure for GitHub Pages on master
         run: |
           owner=$(jq --raw-output .repository.owner.login ${GITHUB_EVENT_PATH})
-          repository=$(jq .repository.name ${GITHUB_EVENT_PATH})
+          repository=$(jq --raw-output .repository.name ${GITHUB_EVENT_PATH})

Review comment:
       Sorry. I didn't check `github.repository.name`.
   It seems that there is no suitable data in `github`. We need to use `jq` for this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] wesm commented on a change in pull request #63: ARROW-9335: [Website] Update website for 1.0

Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r456955278



##########
File path: _data/committers.yml
##########
@@ -0,0 +1,209 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to you under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Database of Apache Arrow committers and PMC
+#
+- name: Jacques Nadeau
+  role: VP
+  alias: jacques
+  affiliation: Dremio
+- name: Ted Dunning
+  role: PMC
+  alias: tdunning
+  affiliation: MapR
+- name: P. Taylor Goetz
+  role: PMC
+  alias: ptgoetz
+  affiliation: Monetate
+- name: Julian Hyde
+  role: PMC
+  alias: jhyde
+  affiliation: Looker
+- name: Reynold Xin
+  role: PMC
+  alias: rxin
+  affiliation: Databricks
+- name: James Taylor
+  role: PMC
+  alias: jamestaylor
+  affiliation: Salesforce
+- name: Julien Le Dem
+  role: PMC
+  alias: julien
+  affiliation: WeWork
+- name: Jake Luciani
+  role: PMC
+  alias: jake
+  affiliation: DataStax
+- name: Jason Altekruse
+  role: PMC
+  alias: json
+  affiliation: Workday
+- name: Alex Levenson
+  role: PMC
+  alias: alexlevenson
+  affiliation: Twitter
+- name: Parth Chandra
+  role: PMC
+  alias: parthc
+  affiliation: MapR
+- name: Marcel Kornacker
+  role: PMC
+  alias: marcel
+  affiliation: Independent
+- name: Steven Phillips
+  role: PMC
+  alias: smp
+  affiliation: Dremio
+- name: Hanifi Gunes
+  role: PMC
+  alias: hg
+  affiliation: MZ
+- name: Abdelhakim Deneche
+  role: PMC
+  alias: adeneche
+  affiliation: Dremio
+- name: Wes McKinney
+  role: PMC
+  alias: wesm
+  affiliation: Ursa Labs / RStudio
+- name: David Alves
+  role: Committer
+  alias: dralves
+  affiliation: Cloudera
+- name: Ippokratis Pandis
+  role: Committer
+  alias: ippokratis
+  affiliation: Amazon
+- name: Uwe L. Korn
+  role: PMC
+  alias: uwe
+  affiliation: Blue Yonder GmbH

Review comment:
       Quantco

##########
File path: _data/committers.yml
##########
@@ -0,0 +1,209 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to you under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Database of Apache Arrow committers and PMC
+#
+- name: Jacques Nadeau
+  role: VP
+  alias: jacques
+  affiliation: Dremio
+- name: Ted Dunning
+  role: PMC
+  alias: tdunning
+  affiliation: MapR
+- name: P. Taylor Goetz
+  role: PMC
+  alias: ptgoetz
+  affiliation: Monetate
+- name: Julian Hyde
+  role: PMC
+  alias: jhyde
+  affiliation: Looker
+- name: Reynold Xin
+  role: PMC
+  alias: rxin
+  affiliation: Databricks
+- name: James Taylor
+  role: PMC
+  alias: jamestaylor
+  affiliation: Salesforce
+- name: Julien Le Dem
+  role: PMC
+  alias: julien
+  affiliation: WeWork

Review comment:
       Change to Datakin

##########
File path: _data/committers.yml
##########
@@ -0,0 +1,209 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to you under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Database of Apache Arrow committers and PMC
+#
+- name: Jacques Nadeau
+  role: VP
+  alias: jacques
+  affiliation: Dremio
+- name: Ted Dunning
+  role: PMC
+  alias: tdunning
+  affiliation: MapR
+- name: P. Taylor Goetz
+  role: PMC
+  alias: ptgoetz
+  affiliation: Monetate
+- name: Julian Hyde
+  role: PMC
+  alias: jhyde
+  affiliation: Looker
+- name: Reynold Xin
+  role: PMC
+  alias: rxin
+  affiliation: Databricks
+- name: James Taylor
+  role: PMC
+  alias: jamestaylor
+  affiliation: Salesforce
+- name: Julien Le Dem
+  role: PMC
+  alias: julien
+  affiliation: WeWork
+- name: Jake Luciani
+  role: PMC
+  alias: jake
+  affiliation: DataStax
+- name: Jason Altekruse
+  role: PMC
+  alias: json
+  affiliation: Workday
+- name: Alex Levenson
+  role: PMC
+  alias: alexlevenson
+  affiliation: Twitter
+- name: Parth Chandra
+  role: PMC
+  alias: parthc
+  affiliation: MapR
+- name: Marcel Kornacker
+  role: PMC
+  alias: marcel
+  affiliation: Independent
+- name: Steven Phillips
+  role: PMC
+  alias: smp
+  affiliation: Dremio
+- name: Hanifi Gunes
+  role: PMC
+  alias: hg
+  affiliation: MZ
+- name: Abdelhakim Deneche
+  role: PMC
+  alias: adeneche
+  affiliation: Dremio
+- name: Wes McKinney
+  role: PMC
+  alias: wesm
+  affiliation: Ursa Labs / RStudio
+- name: David Alves
+  role: Committer
+  alias: dralves
+  affiliation: Cloudera
+- name: Ippokratis Pandis
+  role: Committer
+  alias: ippokratis
+  affiliation: Amazon
+- name: Uwe L. Korn
+  role: PMC
+  alias: uwe
+  affiliation: Blue Yonder GmbH
+- name: Kouhei Sutou
+  role: PMC
+  alias: kou
+  affiliation: ClearCode
+- name: Philipp Moritz
+  role: PMC
+  alias: pcmoritz
+  affiliation: UC Berkeley RISELab
+- name: Phillip Cloud
+  role: PMC
+  alias: cpcloud
+  affiliation: Two Sigma
+- name: Bryan Cutler
+  role: Committer
+  alias: cutlerb
+  affiliation: IBM
+- name: Li Jin
+  role: Committer
+  alias: icexelloss
+  affiliation: Two Sigma
+- name: Siddharth Teotia
+  role: PMC
+  alias: siddteotia
+  affiliation: Dremio
+- name: Brian Hulette
+  role: Committer
+  alias: bhulette
+  affiliation: Google
+- name: Robert Nishihara
+  role: Committer
+  alias: robertnishihara
+  affiliation: UC Berkeley RISELab
+- name: Paul Taylor
+  role: Committer
+  alias: ptaylor
+  affiliation: Graphistry

Review comment:
       NVIDIA

##########
File path: _data/committers.yml
##########
@@ -0,0 +1,209 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to you under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Database of Apache Arrow committers and PMC
+#
+- name: Jacques Nadeau
+  role: VP
+  alias: jacques
+  affiliation: Dremio
+- name: Ted Dunning
+  role: PMC
+  alias: tdunning
+  affiliation: MapR
+- name: P. Taylor Goetz
+  role: PMC
+  alias: ptgoetz
+  affiliation: Monetate
+- name: Julian Hyde
+  role: PMC
+  alias: jhyde
+  affiliation: Looker
+- name: Reynold Xin
+  role: PMC
+  alias: rxin
+  affiliation: Databricks
+- name: James Taylor
+  role: PMC
+  alias: jamestaylor
+  affiliation: Salesforce
+- name: Julien Le Dem
+  role: PMC
+  alias: julien
+  affiliation: WeWork
+- name: Jake Luciani
+  role: PMC
+  alias: jake
+  affiliation: DataStax
+- name: Jason Altekruse
+  role: PMC
+  alias: json
+  affiliation: Workday
+- name: Alex Levenson
+  role: PMC
+  alias: alexlevenson
+  affiliation: Twitter
+- name: Parth Chandra
+  role: PMC
+  alias: parthc
+  affiliation: MapR
+- name: Marcel Kornacker
+  role: PMC
+  alias: marcel
+  affiliation: Independent
+- name: Steven Phillips
+  role: PMC
+  alias: smp
+  affiliation: Dremio
+- name: Hanifi Gunes
+  role: PMC
+  alias: hg
+  affiliation: MZ
+- name: Abdelhakim Deneche
+  role: PMC
+  alias: adeneche
+  affiliation: Dremio
+- name: Wes McKinney
+  role: PMC
+  alias: wesm
+  affiliation: Ursa Labs / RStudio
+- name: David Alves
+  role: Committer
+  alias: dralves
+  affiliation: Cloudera
+- name: Ippokratis Pandis
+  role: Committer
+  alias: ippokratis
+  affiliation: Amazon
+- name: Uwe L. Korn
+  role: PMC
+  alias: uwe
+  affiliation: Blue Yonder GmbH
+- name: Kouhei Sutou
+  role: PMC
+  alias: kou
+  affiliation: ClearCode
+- name: Philipp Moritz
+  role: PMC
+  alias: pcmoritz
+  affiliation: UC Berkeley RISELab
+- name: Phillip Cloud
+  role: PMC
+  alias: cpcloud
+  affiliation: Two Sigma
+- name: Bryan Cutler
+  role: Committer
+  alias: cutlerb
+  affiliation: IBM
+- name: Li Jin
+  role: Committer
+  alias: icexelloss
+  affiliation: Two Sigma
+- name: Siddharth Teotia
+  role: PMC
+  alias: siddteotia
+  affiliation: Dremio

Review comment:
       LinkedIn

##########
File path: _data/committers.yml
##########
@@ -0,0 +1,209 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to you under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Database of Apache Arrow committers and PMC
+#
+- name: Jacques Nadeau
+  role: VP
+  alias: jacques
+  affiliation: Dremio
+- name: Ted Dunning
+  role: PMC
+  alias: tdunning
+  affiliation: MapR
+- name: P. Taylor Goetz
+  role: PMC
+  alias: ptgoetz
+  affiliation: Monetate
+- name: Julian Hyde
+  role: PMC
+  alias: jhyde
+  affiliation: Looker
+- name: Reynold Xin
+  role: PMC
+  alias: rxin
+  affiliation: Databricks
+- name: James Taylor
+  role: PMC
+  alias: jamestaylor
+  affiliation: Salesforce
+- name: Julien Le Dem
+  role: PMC
+  alias: julien
+  affiliation: WeWork
+- name: Jake Luciani
+  role: PMC
+  alias: jake
+  affiliation: DataStax
+- name: Jason Altekruse
+  role: PMC
+  alias: json
+  affiliation: Workday
+- name: Alex Levenson
+  role: PMC
+  alias: alexlevenson
+  affiliation: Twitter
+- name: Parth Chandra
+  role: PMC
+  alias: parthc
+  affiliation: MapR
+- name: Marcel Kornacker
+  role: PMC
+  alias: marcel
+  affiliation: Independent
+- name: Steven Phillips
+  role: PMC
+  alias: smp
+  affiliation: Dremio
+- name: Hanifi Gunes
+  role: PMC
+  alias: hg
+  affiliation: MZ
+- name: Abdelhakim Deneche
+  role: PMC
+  alias: adeneche
+  affiliation: Dremio
+- name: Wes McKinney
+  role: PMC
+  alias: wesm
+  affiliation: Ursa Labs / RStudio
+- name: David Alves
+  role: Committer
+  alias: dralves
+  affiliation: Cloudera
+- name: Ippokratis Pandis
+  role: Committer
+  alias: ippokratis
+  affiliation: Amazon
+- name: Uwe L. Korn
+  role: PMC
+  alias: uwe
+  affiliation: Blue Yonder GmbH
+- name: Kouhei Sutou
+  role: PMC
+  alias: kou
+  affiliation: ClearCode
+- name: Philipp Moritz
+  role: PMC
+  alias: pcmoritz
+  affiliation: UC Berkeley RISELab
+- name: Phillip Cloud
+  role: PMC
+  alias: cpcloud
+  affiliation: Two Sigma

Review comment:
       Standard Cognition

##########
File path: _data/committers.yml
##########
@@ -0,0 +1,209 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to you under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Database of Apache Arrow committers and PMC
+#
+- name: Jacques Nadeau
+  role: VP
+  alias: jacques
+  affiliation: Dremio
+- name: Ted Dunning
+  role: PMC
+  alias: tdunning
+  affiliation: MapR
+- name: P. Taylor Goetz
+  role: PMC
+  alias: ptgoetz
+  affiliation: Monetate
+- name: Julian Hyde
+  role: PMC
+  alias: jhyde
+  affiliation: Looker
+- name: Reynold Xin
+  role: PMC
+  alias: rxin
+  affiliation: Databricks
+- name: James Taylor
+  role: PMC
+  alias: jamestaylor
+  affiliation: Salesforce
+- name: Julien Le Dem
+  role: PMC
+  alias: julien
+  affiliation: WeWork
+- name: Jake Luciani
+  role: PMC
+  alias: jake
+  affiliation: DataStax
+- name: Jason Altekruse
+  role: PMC
+  alias: json
+  affiliation: Workday
+- name: Alex Levenson
+  role: PMC
+  alias: alexlevenson
+  affiliation: Twitter
+- name: Parth Chandra
+  role: PMC
+  alias: parthc
+  affiliation: MapR
+- name: Marcel Kornacker
+  role: PMC
+  alias: marcel
+  affiliation: Independent
+- name: Steven Phillips
+  role: PMC
+  alias: smp
+  affiliation: Dremio
+- name: Hanifi Gunes
+  role: PMC
+  alias: hg
+  affiliation: MZ
+- name: Abdelhakim Deneche
+  role: PMC
+  alias: adeneche
+  affiliation: Dremio
+- name: Wes McKinney
+  role: PMC
+  alias: wesm
+  affiliation: Ursa Labs / RStudio
+- name: David Alves
+  role: Committer
+  alias: dralves
+  affiliation: Cloudera

Review comment:
       Now "CortexXus"

##########
File path: _data/committers.yml
##########
@@ -0,0 +1,209 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to you under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Database of Apache Arrow committers and PMC
+#
+- name: Jacques Nadeau
+  role: VP
+  alias: jacques
+  affiliation: Dremio
+- name: Ted Dunning
+  role: PMC
+  alias: tdunning
+  affiliation: MapR
+- name: P. Taylor Goetz
+  role: PMC
+  alias: ptgoetz
+  affiliation: Monetate
+- name: Julian Hyde
+  role: PMC
+  alias: jhyde
+  affiliation: Looker
+- name: Reynold Xin
+  role: PMC
+  alias: rxin
+  affiliation: Databricks
+- name: James Taylor
+  role: PMC
+  alias: jamestaylor
+  affiliation: Salesforce
+- name: Julien Le Dem
+  role: PMC
+  alias: julien
+  affiliation: WeWork
+- name: Jake Luciani
+  role: PMC
+  alias: jake
+  affiliation: DataStax
+- name: Jason Altekruse
+  role: PMC
+  alias: json
+  affiliation: Workday
+- name: Alex Levenson
+  role: PMC
+  alias: alexlevenson
+  affiliation: Twitter
+- name: Parth Chandra
+  role: PMC
+  alias: parthc
+  affiliation: MapR
+- name: Marcel Kornacker
+  role: PMC
+  alias: marcel
+  affiliation: Independent
+- name: Steven Phillips
+  role: PMC
+  alias: smp
+  affiliation: Dremio
+- name: Hanifi Gunes
+  role: PMC
+  alias: hg
+  affiliation: MZ
+- name: Abdelhakim Deneche
+  role: PMC
+  alias: adeneche
+  affiliation: Dremio
+- name: Wes McKinney
+  role: PMC
+  alias: wesm
+  affiliation: Ursa Labs / RStudio
+- name: David Alves
+  role: Committer
+  alias: dralves
+  affiliation: Cloudera
+- name: Ippokratis Pandis
+  role: Committer
+  alias: ippokratis
+  affiliation: Amazon
+- name: Uwe L. Korn
+  role: PMC
+  alias: uwe
+  affiliation: Blue Yonder GmbH
+- name: Kouhei Sutou
+  role: PMC
+  alias: kou
+  affiliation: ClearCode
+- name: Philipp Moritz
+  role: PMC
+  alias: pcmoritz
+  affiliation: UC Berkeley RISELab
+- name: Phillip Cloud
+  role: PMC
+  alias: cpcloud
+  affiliation: Two Sigma
+- name: Bryan Cutler
+  role: Committer
+  alias: cutlerb
+  affiliation: IBM
+- name: Li Jin
+  role: Committer
+  alias: icexelloss
+  affiliation: Two Sigma
+- name: Siddharth Teotia
+  role: PMC
+  alias: siddteotia
+  affiliation: Dremio
+- name: Brian Hulette
+  role: Committer
+  alias: bhulette
+  affiliation: Google
+- name: Robert Nishihara
+  role: Committer
+  alias: robertnishihara
+  affiliation: UC Berkeley RISELab

Review comment:
       Anyscale

##########
File path: _data/committers.yml
##########
@@ -0,0 +1,209 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to you under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Database of Apache Arrow committers and PMC
+#
+- name: Jacques Nadeau
+  role: VP
+  alias: jacques
+  affiliation: Dremio
+- name: Ted Dunning
+  role: PMC
+  alias: tdunning
+  affiliation: MapR
+- name: P. Taylor Goetz
+  role: PMC
+  alias: ptgoetz
+  affiliation: Monetate
+- name: Julian Hyde
+  role: PMC
+  alias: jhyde
+  affiliation: Looker
+- name: Reynold Xin
+  role: PMC
+  alias: rxin
+  affiliation: Databricks
+- name: James Taylor
+  role: PMC
+  alias: jamestaylor
+  affiliation: Salesforce
+- name: Julien Le Dem
+  role: PMC
+  alias: julien
+  affiliation: WeWork
+- name: Jake Luciani
+  role: PMC
+  alias: jake
+  affiliation: DataStax
+- name: Jason Altekruse
+  role: PMC
+  alias: json
+  affiliation: Workday
+- name: Alex Levenson
+  role: PMC
+  alias: alexlevenson
+  affiliation: Twitter
+- name: Parth Chandra
+  role: PMC
+  alias: parthc
+  affiliation: MapR
+- name: Marcel Kornacker
+  role: PMC
+  alias: marcel
+  affiliation: Independent
+- name: Steven Phillips
+  role: PMC
+  alias: smp
+  affiliation: Dremio
+- name: Hanifi Gunes
+  role: PMC
+  alias: hg
+  affiliation: MZ
+- name: Abdelhakim Deneche
+  role: PMC
+  alias: adeneche
+  affiliation: Dremio
+- name: Wes McKinney
+  role: PMC
+  alias: wesm
+  affiliation: Ursa Labs / RStudio
+- name: David Alves
+  role: Committer
+  alias: dralves
+  affiliation: Cloudera
+- name: Ippokratis Pandis
+  role: Committer
+  alias: ippokratis
+  affiliation: Amazon
+- name: Uwe L. Korn
+  role: PMC
+  alias: uwe
+  affiliation: Blue Yonder GmbH
+- name: Kouhei Sutou
+  role: PMC
+  alias: kou
+  affiliation: ClearCode
+- name: Philipp Moritz
+  role: PMC
+  alias: pcmoritz
+  affiliation: UC Berkeley RISELab

Review comment:
       Anyscale




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] wesm commented on pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#issuecomment-636465872


   Thank you for starting this. I can help with the content / language. 
   
   I agree that engaging a web designer would be a good idea. I have budget to pay for this


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] wesm commented on pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#issuecomment-649578293


   "Arrow memory format" -> "Arrow columnar protocol" or similar?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] chrish42 commented on pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
chrish42 commented on pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#issuecomment-643284687


   As someone who's less involved with the Arrow project than the average person here, I really like it overall. It has answers to questions about Arrow that I had to do a fair amount of digging in the past to figure out. So I think overall it's moving the website in the right direction. FYI.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] wesm edited a comment on pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#issuecomment-649578293


   "Arrow memory format" -> "Arrow columnar IPC protocol" or similar?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] pitrou commented on pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#issuecomment-649460059


   "The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead."
   
   That's not very comprehensible. Perhaps this is about the IPC format? Which is a slightly different thing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r438850170



##########
File path: community.md
##########
@@ -0,0 +1,66 @@
+---
+layout: default
+title: Apache Arrow Community
+description: Links and resources for participating in Apache Arrow
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Apache Arrow Community
+
+We welcome participation from everyone and encourage you to join us, ask questions, and get involved.
+
+All participation in the Apache Arrow project is governed by the Apache Software Foundation's [code of conduct](https://www.apache.org/foundation/policies/conduct.html).
+
+## Questions?
+
+### Mailing lists
+
+These arrow.apache.org mailing lists are for project discussion:
+
+<ul>
+  <li> <code>user@</code> is for questions on using Apache Arrow libraries {% include mailing_list_links.html list="user" %} </li>
+  <li> <code>dev@</code> is for discussions about contributing to the project development {% include mailing_list_links.html list="dev" %} </li>
+</ul>
+
+When emailing one of the lists, you may want to prefix the subject line with one or more tags, like `[C++] why did this segfault?`, `[Python] trouble with wheels`, etc., so that the appropriate people in the community notice the message.
+
+In addition, these lists log several activity streams:
+
+<ul>
+  <li> <code>issues@</code> for JIRA activity {% include mailing_list_links.html list="issues" %} </li>

Review comment:
       Yes




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] wesm commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r445134010



##########
File path: _includes/header.html
##########
@@ -50,22 +33,44 @@
           </a>
           <div class="dropdown-menu" aria-labelledby="navbarDropdownDocumentation">
             <a class="dropdown-item" href="{{ site.baseurl }}/docs">Project Docs</a>
-            <a class="dropdown-item" href="{{ site.baseurl }}/docs/python">Python</a>
+            <a class="dropdown-item" href="{{ site.baseurl }}/docs/format/Columnar.html">Specification</a>
+            <hr/>
+            <a class="dropdown-item" href="{{ site.baseurl }}/docs/c_glib">C GLib</a>
             <a class="dropdown-item" href="{{ site.baseurl }}/docs/cpp">C++</a>
+            <a class="dropdown-item" href="https://github.com/apache/arrow/blob/master/csharp/README.md">C#</a>
+            <a class="dropdown-item" href="https://godoc.org/github.com/apache/arrow/go/arrow">Go</a>
             <a class="dropdown-item" href="{{ site.baseurl }}/docs/java">Java</a>
-            <a class="dropdown-item" href="{{ site.baseurl }}/docs/c_glib">C GLib</a>
             <a class="dropdown-item" href="{{ site.baseurl }}/docs/js">JavaScript</a>
+            <a class="dropdown-item" href="https://github.com/apache/arrow/blob/master/matlab/README.md">MATLAB</a>
+            <a class="dropdown-item" href="{{ site.baseurl }}/docs/python">Python</a>
             <a class="dropdown-item" href="{{ site.baseurl }}/docs/r">R</a>
+            <a class="dropdown-item" href="https://github.com/apache/arrow/blob/master/ruby/README.md">Ruby</a>
+            <a class="dropdown-item" href="https://docs.rs/crate/arrow/">Rust</a>
+          </div>
+        </li>
+        <li class="nav-item dropdown">
+          <a class="nav-link dropdown-toggle" href="#"
+             id="navbarDropdownCommunity" role="button" data-toggle="dropdown"
+             aria-haspopup="true" aria-expanded="false">
+             Community
+          </a>
+          <div class="dropdown-menu" aria-labelledby="navbarDropdownCommunity">
+            <a class="dropdown-item" href="{{ site.baseurl }}/community/">Mailing Lists</a>

Review comment:
       "Communications"?

##########
File path: _includes/header.html
##########
@@ -50,22 +33,44 @@
           </a>
           <div class="dropdown-menu" aria-labelledby="navbarDropdownDocumentation">
             <a class="dropdown-item" href="{{ site.baseurl }}/docs">Project Docs</a>
-            <a class="dropdown-item" href="{{ site.baseurl }}/docs/python">Python</a>
+            <a class="dropdown-item" href="{{ site.baseurl }}/docs/format/Columnar.html">Specification</a>

Review comment:
       "Columnar Format"?

##########
File path: community.md
##########
@@ -0,0 +1,73 @@
+---
+layout: default
+title: Apache Arrow Community
+description: Links and resources for participating in Apache Arrow
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Apache Arrow Community
+
+We welcome participation from everyone and encourage you to join us, ask questions, and get involved.
+
+All participation in the Apache Arrow project is governed by the Apache Software Foundation's [code of conduct](https://www.apache.org/foundation/policies/conduct.html).
+
+## Questions?
+
+### Mailing lists
+
+These arrow.apache.org mailing lists are for project discussion:
+
+<ul>
+  <li> <code>user@</code> is for questions on using Apache Arrow libraries {% include mailing_list_links.html list="user" %} </li>
+  <li> <code>dev@</code> is for discussions about contributing to the project development {% include mailing_list_links.html list="dev" %} </li>
+</ul>
+
+When emailing one of the lists, you may want to prefix the subject line with one or more tags, like `[C++] why did this segfault?`, `[Python] trouble with wheels`, etc., so that the appropriate people in the community notice the message.
+
+You may also wish to subscript to these lists, which capture some activity streams:

Review comment:
       subscribe

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.

Review comment:
       Let's link to the versioning backward/forward compatibility guarantees in the docs

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.

Review comment:
       Here's a reframing -- I have been encouraging us to move away from creating a false equivalence between "Apache Arrow The Project" and the "Arrow Columnar Format". So anyplace where someone might say "Arrow _is_ the columnar format" we should correct them to say that "Arrow _contains_ a columnar format". Please edit / wordsmith as desired
   
   Apache Arrow is a software development platform for building high performance applications that process and transport large data sets. It is designed to both improve the performance of analytical algorithms and the efficiency of moving data from one system (or programming language to another). 
   
   A critical component of Apache Arrow is its **in-memory columnar format**, a standardized language-agnostic data structure specification for representing structured, table-like datasets in-memory. This data format has a rich data type system (included nested and user-defined data types) designed to support the needs of analytic database systems, data frame libraries, and more. The project contains many implementation of the Arrow columnar format along with utilities for reading and writing it to many common storage formats. 
   
   We do not anticipate that many third-party projects will choose to implement the Arrow columnar format themselves, instead choosing to depend on one of the official libraries. For projects that want to implement a small subset of the format, we have created some tools (like a C data interface) to assist with interoperability with the official Arrow libraries.
   
   The Arrow libraries contain many software components that assist with systems problems related to getting data in and out of remote storage systems and moving Arrow-formatted data over network interfaces. Some of these components can be used even in scenarios where the columnar format is not used at all. 
   
   Lastly, alongside software that helps with data access and IO-related issues, there are libraries of algorithms for performing analytical operations or queries against Arrow datasets.

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.

Review comment:
       I think this para can be removed as of 1.0.0

##########
File path: _layouts/home.html
##########
@@ -0,0 +1,21 @@
+{% include top.html %}
+
+<body class="wrap">
+  <header>
+    {% include header.html %}
+  </header>
+  <div class="big-arrow-bg">
+    <div class="container p-lg-4 centered">
+      <img src="{{ site.baseurl }}/img/arrow-inverse.png" style="max-width: 80%;"/>

Review comment:
       Smaller also better imho

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.

Review comment:
       I don't think we need to hedge regarding people storage Arrow data on disk starting with 1.0.0. We should state explicitly here however that we don't intend for Arrow to be replacement for Parquet (an exceedingly common question) and where relevant the columnar format makes trade-offs to support the performance requirements of in-memory analytics over purely file storage considerations. Parquet is not a "runtime in-memory format" and file formats almost always have to be deserialized into some in-memory data structure for processing, and we intend for Arrow to be that in-memory data structure

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only backwards-compatible changes, such as additional data types. We do not yet recommend the Arrow file format for long-term disk persistence of data; that said, it is perfectly acceptable to write Arrow memory to disk for purposes of memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing applications now, and choose a suitable file format for disk storage if necessary. The Arrow libraries include adapters for several file formats, including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl }}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to [private@arrow.apache.org](mailto:private@arrow.apache.org).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed for in-memory use, but you can put it on disk and then memory-map later. Arrow and Parquet are intended to be compatible with each other and used together in applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet file requires decompressing and decoding its contents into some kind of in-memory data structure. It is designed to be space/IO-efficient at the expensive CPU utilization for decoding. It does not provide any data structures for in-memory computing. Parquet is a streaming format which must be decoded from start-to-end; while some "index page" facilities have been added to the storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file format".
 
-Arrow on the other hand is first and foremost a library providing columnar data structures for *in-memory computing*. When you read a Parquet file, you can decompress and decode the data *into* Arrow columnar data structures so that you can then perform analytics in-memory on the decoded data. The Arrow columnar format has some nice properties: random access is O(1) and each value cell is next to the previous and following one in memory, so it's efficient to iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.
 
-What about "Arrow files" then? Apache Arrow defines a binary "serialization" protocol for arranging a collection of Arrow columnar arrays (called a "record batch") that can be used for messaging and interprocess communication. You can put the protocol anywhere, including on disk, which can later be memory-mapped or read into memory and sent elsewhere.
+Some things to keep in mind when comparing the Arrow IPC file format and the
+Parquet format:
 
-This Arrow protocol is designed so that you can "map" a blob of Arrow data without doing any deserialization, so performing analytics on Arrow protocol data on disk can use memory-mapping and pay effectively zero cost. The protocol is used for many other things as well, such as streaming data between Spark SQL and Python for running pandas functions against chunks of Spark SQL data (these are called "pandas udfs").
+* Parquet is safe for long-term storage and archival purposes, meaning if
+  you write a file today, you can expect that any system that says they can
+  "read Parquet" will be able to read the file in 5 years or 10 years.
+  We are not yet making this assertion about long-term stability of the Arrow
+  format.
 
-In some applications, Parquet and Arrow can be used interchangeably for on-disk data serialization. Some things to keep in mind:
+* Reading Parquet files generally requires expensive decoding, while reading

Review comment:
       "expensive" is in the eye of the beholder. How about "requires efficient, but relatively complex decoding"

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only backwards-compatible changes, such as additional data types. We do not yet recommend the Arrow file format for long-term disk persistence of data; that said, it is perfectly acceptable to write Arrow memory to disk for purposes of memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing applications now, and choose a suitable file format for disk storage if necessary. The Arrow libraries include adapters for several file formats, including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl }}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to [private@arrow.apache.org](mailto:private@arrow.apache.org).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed for in-memory use, but you can put it on disk and then memory-map later. Arrow and Parquet are intended to be compatible with each other and used together in applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet file requires decompressing and decoding its contents into some kind of in-memory data structure. It is designed to be space/IO-efficient at the expensive CPU utilization for decoding. It does not provide any data structures for in-memory computing. Parquet is a streaming format which must be decoded from start-to-end; while some "index page" facilities have been added to the storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file format".
 
-Arrow on the other hand is first and foremost a library providing columnar data structures for *in-memory computing*. When you read a Parquet file, you can decompress and decode the data *into* Arrow columnar data structures so that you can then perform analytics in-memory on the decoded data. The Arrow columnar format has some nice properties: random access is O(1) and each value cell is next to the previous and following one in memory, so it's efficient to iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.
 
-What about "Arrow files" then? Apache Arrow defines a binary "serialization" protocol for arranging a collection of Arrow columnar arrays (called a "record batch") that can be used for messaging and interprocess communication. You can put the protocol anywhere, including on disk, which can later be memory-mapped or read into memory and sent elsewhere.
+Some things to keep in mind when comparing the Arrow IPC file format and the
+Parquet format:
 
-This Arrow protocol is designed so that you can "map" a blob of Arrow data without doing any deserialization, so performing analytics on Arrow protocol data on disk can use memory-mapping and pay effectively zero cost. The protocol is used for many other things as well, such as streaming data between Spark SQL and Python for running pandas functions against chunks of Spark SQL data (these are called "pandas udfs").
+* Parquet is safe for long-term storage and archival purposes, meaning if
+  you write a file today, you can expect that any system that says they can
+  "read Parquet" will be able to read the file in 5 years or 10 years.
+  We are not yet making this assertion about long-term stability of the Arrow
+  format.

Review comment:
       "We are not yet making this assertion about long-term stability of the Arrow format."
   
   --> "While the Arrow on-disk format is stable and will be readable by future versions of the libraries, it is not intended for long-term archival storage."

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only backwards-compatible changes, such as additional data types. We do not yet recommend the Arrow file format for long-term disk persistence of data; that said, it is perfectly acceptable to write Arrow memory to disk for purposes of memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing applications now, and choose a suitable file format for disk storage if necessary. The Arrow libraries include adapters for several file formats, including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl }}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to [private@arrow.apache.org](mailto:private@arrow.apache.org).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed for in-memory use, but you can put it on disk and then memory-map later. Arrow and Parquet are intended to be compatible with each other and used together in applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet file requires decompressing and decoding its contents into some kind of in-memory data structure. It is designed to be space/IO-efficient at the expensive CPU utilization for decoding. It does not provide any data structures for in-memory computing. Parquet is a streaming format which must be decoded from start-to-end; while some "index page" facilities have been added to the storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file format".
 
-Arrow on the other hand is first and foremost a library providing columnar data structures for *in-memory computing*. When you read a Parquet file, you can decompress and decode the data *into* Arrow columnar data structures so that you can then perform analytics in-memory on the decoded data. The Arrow columnar format has some nice properties: random access is O(1) and each value cell is next to the previous and following one in memory, so it's efficient to iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.
 
-What about "Arrow files" then? Apache Arrow defines a binary "serialization" protocol for arranging a collection of Arrow columnar arrays (called a "record batch") that can be used for messaging and interprocess communication. You can put the protocol anywhere, including on disk, which can later be memory-mapped or read into memory and sent elsewhere.
+Some things to keep in mind when comparing the Arrow IPC file format and the
+Parquet format:
 
-This Arrow protocol is designed so that you can "map" a blob of Arrow data without doing any deserialization, so performing analytics on Arrow protocol data on disk can use memory-mapping and pay effectively zero cost. The protocol is used for many other things as well, such as streaming data between Spark SQL and Python for running pandas functions against chunks of Spark SQL data (these are called "pandas udfs").
+* Parquet is safe for long-term storage and archival purposes, meaning if
+  you write a file today, you can expect that any system that says they can
+  "read Parquet" will be able to read the file in 5 years or 10 years.
+  We are not yet making this assertion about long-term stability of the Arrow
+  format.
 
-In some applications, Parquet and Arrow can be used interchangeably for on-disk data serialization. Some things to keep in mind:
+* Reading Parquet files generally requires expensive decoding, while reading
+  Arrow IPC files is just a matter of transferring raw bytes from the storage
+  hardware.

Review comment:
       Instead of "just a matter of transferring raw bytes from the storage hardware." how about the more precise statement "reading Arrow IPC files does not involve any decoding because the on-disk representation is the same as the in-memory representation."

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->

Review comment:
       perhaps merge this with some of the thoughts above

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only backwards-compatible changes, such as additional data types. We do not yet recommend the Arrow file format for long-term disk persistence of data; that said, it is perfectly acceptable to write Arrow memory to disk for purposes of memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing applications now, and choose a suitable file format for disk storage if necessary. The Arrow libraries include adapters for several file formats, including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl }}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to [private@arrow.apache.org](mailto:private@arrow.apache.org).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed for in-memory use, but you can put it on disk and then memory-map later. Arrow and Parquet are intended to be compatible with each other and used together in applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet file requires decompressing and decoding its contents into some kind of in-memory data structure. It is designed to be space/IO-efficient at the expensive CPU utilization for decoding. It does not provide any data structures for in-memory computing. Parquet is a streaming format which must be decoded from start-to-end; while some "index page" facilities have been added to the storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file format".
 
-Arrow on the other hand is first and foremost a library providing columnar data structures for *in-memory computing*. When you read a Parquet file, you can decompress and decode the data *into* Arrow columnar data structures so that you can then perform analytics in-memory on the decoded data. The Arrow columnar format has some nice properties: random access is O(1) and each value cell is next to the previous and following one in memory, so it's efficient to iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.
 
-What about "Arrow files" then? Apache Arrow defines a binary "serialization" protocol for arranging a collection of Arrow columnar arrays (called a "record batch") that can be used for messaging and interprocess communication. You can put the protocol anywhere, including on disk, which can later be memory-mapped or read into memory and sent elsewhere.
+Some things to keep in mind when comparing the Arrow IPC file format and the
+Parquet format:
 
-This Arrow protocol is designed so that you can "map" a blob of Arrow data without doing any deserialization, so performing analytics on Arrow protocol data on disk can use memory-mapping and pay effectively zero cost. The protocol is used for many other things as well, such as streaming data between Spark SQL and Python for running pandas functions against chunks of Spark SQL data (these are called "pandas udfs").
+* Parquet is safe for long-term storage and archival purposes, meaning if
+  you write a file today, you can expect that any system that says they can
+  "read Parquet" will be able to read the file in 5 years or 10 years.
+  We are not yet making this assertion about long-term stability of the Arrow
+  format.
 
-In some applications, Parquet and Arrow can be used interchangeably for on-disk data serialization. Some things to keep in mind:
+* Reading Parquet files generally requires expensive decoding, while reading
+  Arrow IPC files is just a matter of transferring raw bytes from the storage
+  hardware.
 
-* Parquet is intended for "archival" purposes, meaning if you write a file today, we expect that any system that says they can "read Parquet" will be able to read the file in 5 years or 7 years. We are not yet making this assertion about long-term stability of the Arrow format.
-* Parquet is generally a lot more expensive to read because it must be decoded into some other data structure. Arrow protocol data can simply be memory-mapped.
-* Parquet files are often much smaller than Arrow-protocol-on-disk because of the data encoding schemes that Parquet uses. If your disk storage or network is slow, Parquet may be a better choice.
+* Parquet files are often much smaller than Arrow IPC files because of the
+  elaborate encoding schemes that Parquet uses. If your disk storage or network
+  is slow, Parquet may be a better choice even for short-term storage or caching.
+
+### What about the "Feather" file format?
+
+The Feather v1 format started as a separate specification, but the Feather v2
+format is just another, easier to remember name for the Arrow IPC file format.
 
 ### How does Arrow relate to Flatbuffers?
 
-Flatbuffers is a domain-agnostic low-level building block for binary data formats. It cannot be used directly for data analysis tasks without a lot of manual scaffolding. Arrow is a data layer aimed directly at the needs of data analysis, providing elaborate data types (including extensible logical types), built-in support for "null" values (a.k.a "N/A"), and an expanding toolbox of I/O and computing facilities.
+Flatbuffers is a low-level building block for binary data serialization.
+It is not adapted to the representation of large, structured, homogenous
+data, and does not sit at the right abstraction layer for data analysis tasks.
+
+Arrow is a data layer aimed directly at the needs of data analysis, providing
+elaborate data types (including extensible logical types), built-in support

Review comment:
       Use a more neutral word than "elaborate". How about, "providing a comprehensive collection of data types required to analytics" or something similar

##########
File path: index.html
##########
@@ -1,72 +1,62 @@
 ---
-layout: default
+layout: home
 ---
-<div class="jumbotron">
-    <h1>Apache Arrow</h1>
-    <p class="lead">A cross-language development platform for in-memory data</p>
-    <p>
-        <a class="btn btn-lg btn-success" style="white-space: normal;" href="mailto:dev-subscribe@arrow.apache.org" role="button">Join Mailing List</a>
-        <a class="btn btn-lg btn-primary" style="white-space: normal;" href="{{ site.baseurl }}/install/" role="button">Install ({{site.data.versions['current'].number}} Release - {{site.data.versions['current'].date}})</a>
-    </p>
-</div>
-<h5>
-  Interested in contributing?
-  <small class="text-muted">Join the <a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/"><strong>mailing list</strong></a> or check out the <a href="https://cwiki.apache.org/confluence/display/ARROW"><strong>developer wiki</strong></a>.</small>
-</h5>
-<h5>
-  <a href="{{ site.baseurl }}/blog/"><strong>See Latest News</strong></a>
-</h5>
-<p>
-  {{ site.description }}
-</p>
-<hr />
+<h1>What is Arrow?</h1>
 <div class="row">
   <div class="col-lg-4">
-      <h2 class="mt-3">Fast</h2>
-      <p>Apache Arrow&#8482; enables execution engines to take advantage of the latest SIMD (Single instruction, multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing. Columnar layout is optimized for data locality for better performance on modern hardware like CPUs and GPUs.</p>
-      <p>The Arrow memory format supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <h2 class="mt-3">Format</h2>
+      <p>Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <p><a href="{{ site.baseurl }}/overview/">Learn more</a> about the design or
+        <a href="{{ site.baseurl }}/docs/format/Columnar.html">read the specification</a>.</p>
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Flexible</h2>
-      <p>Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust implementations are in progress and more languages are welcome.
+      <h2 class="mt-3">Libraries</h2>
+      <p>The Arrow project includes libraries that implement the memory specification in many languages. They enable you to use the Arrow format as an efficient means of sharing data across languages and processes. Libraries are available for <a href="{{ site.baseurl }}/docs/c_glib/">C</a>, <a href="{{ site.baseurl }}/docs/cpp/">C++</a>, <a href="https://github.com/apache/arrow/blob/master/csharp/README.md">C#</a>, <a href="https://godoc.org/github.com/apache/arrow/go/arrow">Go</a>, <a href="{{ site.baseurl }}/docs/java/">Java</a>, <a href="{{ site.baseurl }}/docs/js/">JavaScript</a>, <a href="https://github.com/apache/arrow/blob/master/matlab/README.md">MATLAB</a>, <a href="{{ site.baseurl }}/docs/python/">Python</a>, <a href="{{ site.baseurl }}/docs/r/">R</a>, <a href="https://github.com/apache/arrow/blob/master/ruby/README.md">Ruby</a>, and <a href="https://docs.rs/crate/arrow/">Rust</a>.

Review comment:
       Arrow's libraries provide building blocks for creating high performance analytics applications. The libraries implement the Arrow columnar format and address a wide spectrum of problems related to data access, in-memory data management, and analytical query processing. 

##########
File path: index.html
##########
@@ -1,72 +1,62 @@
 ---
-layout: default
+layout: home
 ---
-<div class="jumbotron">
-    <h1>Apache Arrow</h1>
-    <p class="lead">A cross-language development platform for in-memory data</p>
-    <p>
-        <a class="btn btn-lg btn-success" style="white-space: normal;" href="mailto:dev-subscribe@arrow.apache.org" role="button">Join Mailing List</a>
-        <a class="btn btn-lg btn-primary" style="white-space: normal;" href="{{ site.baseurl }}/install/" role="button">Install ({{site.data.versions['current'].number}} Release - {{site.data.versions['current'].date}})</a>
-    </p>
-</div>
-<h5>
-  Interested in contributing?
-  <small class="text-muted">Join the <a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/"><strong>mailing list</strong></a> or check out the <a href="https://cwiki.apache.org/confluence/display/ARROW"><strong>developer wiki</strong></a>.</small>
-</h5>
-<h5>
-  <a href="{{ site.baseurl }}/blog/"><strong>See Latest News</strong></a>
-</h5>
-<p>
-  {{ site.description }}
-</p>
-<hr />
+<h1>What is Arrow?</h1>
 <div class="row">
   <div class="col-lg-4">
-      <h2 class="mt-3">Fast</h2>
-      <p>Apache Arrow&#8482; enables execution engines to take advantage of the latest SIMD (Single instruction, multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing. Columnar layout is optimized for data locality for better performance on modern hardware like CPUs and GPUs.</p>
-      <p>The Arrow memory format supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <h2 class="mt-3">Format</h2>
+      <p>Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <p><a href="{{ site.baseurl }}/overview/">Learn more</a> about the design or
+        <a href="{{ site.baseurl }}/docs/format/Columnar.html">read the specification</a>.</p>
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Flexible</h2>
-      <p>Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust implementations are in progress and more languages are welcome.
+      <h2 class="mt-3">Libraries</h2>
+      <p>The Arrow project includes libraries that implement the memory specification in many languages. They enable you to use the Arrow format as an efficient means of sharing data across languages and processes. Libraries are available for <a href="{{ site.baseurl }}/docs/c_glib/">C</a>, <a href="{{ site.baseurl }}/docs/cpp/">C++</a>, <a href="https://github.com/apache/arrow/blob/master/csharp/README.md">C#</a>, <a href="https://godoc.org/github.com/apache/arrow/go/arrow">Go</a>, <a href="{{ site.baseurl }}/docs/java/">Java</a>, <a href="{{ site.baseurl }}/docs/js/">JavaScript</a>, <a href="https://github.com/apache/arrow/blob/master/matlab/README.md">MATLAB</a>, <a href="{{ site.baseurl }}/docs/python/">Python</a>, <a href="{{ site.baseurl }}/docs/r/">R</a>, <a href="https://github.com/apache/arrow/blob/master/ruby/README.md">Ruby</a>, and <a href="https://docs.rs/crate/arrow/">Rust</a>.
       </p>
+      See <a href="{{ site.baseurl }}/install/">how to install</a> and <a href="{{ site.baseurl }}/getting_started/">get started</a>.
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Standard</h2>
-      <p>Apache Arrow is backed by key developers of 13 major open source projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, Kudu, Pandas, Parquet, Phoenix, Spark, and Storm making it the de-facto standard for columnar in-memory analytics.</p>
-      <p>Learn more about projects that are <a href="{{ site.baseurl }}/powered_by/">Powered By Apache Arrow</a></p>
+      <h2 class="mt-3">Applications</h2>
+      <p>Arrow libraries provide a foundation for developers to build fast analytics applications. <a href="{{ site.baseurl }}/powered_by/">Many popular projects</a> use Arrow to ship columnar data efficiently or as the basis for analytic engines.
+      <p>The libraries also include built-in features for working with data directly, including Parquet file reading and querying large datasets. See more Arrow <a href="{{ site.baseurl }}/use_cases/">use cases</a>.</p>
   </div>
 </div>
-<hr />
+
+<h1>Why Arrow?</h1>

Review comment:
       "Why use the Arrow Columnar Format?" 

##########
File path: index.html
##########
@@ -1,72 +1,62 @@
 ---
-layout: default
+layout: home
 ---
-<div class="jumbotron">
-    <h1>Apache Arrow</h1>
-    <p class="lead">A cross-language development platform for in-memory data</p>
-    <p>
-        <a class="btn btn-lg btn-success" style="white-space: normal;" href="mailto:dev-subscribe@arrow.apache.org" role="button">Join Mailing List</a>
-        <a class="btn btn-lg btn-primary" style="white-space: normal;" href="{{ site.baseurl }}/install/" role="button">Install ({{site.data.versions['current'].number}} Release - {{site.data.versions['current'].date}})</a>
-    </p>
-</div>
-<h5>
-  Interested in contributing?
-  <small class="text-muted">Join the <a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/"><strong>mailing list</strong></a> or check out the <a href="https://cwiki.apache.org/confluence/display/ARROW"><strong>developer wiki</strong></a>.</small>
-</h5>
-<h5>
-  <a href="{{ site.baseurl }}/blog/"><strong>See Latest News</strong></a>
-</h5>
-<p>
-  {{ site.description }}
-</p>
-<hr />
+<h1>What is Arrow?</h1>
 <div class="row">
   <div class="col-lg-4">
-      <h2 class="mt-3">Fast</h2>
-      <p>Apache Arrow&#8482; enables execution engines to take advantage of the latest SIMD (Single instruction, multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing. Columnar layout is optimized for data locality for better performance on modern hardware like CPUs and GPUs.</p>
-      <p>The Arrow memory format supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <h2 class="mt-3">Format</h2>
+      <p>Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <p><a href="{{ site.baseurl }}/overview/">Learn more</a> about the design or
+        <a href="{{ site.baseurl }}/docs/format/Columnar.html">read the specification</a>.</p>
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Flexible</h2>
-      <p>Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust implementations are in progress and more languages are welcome.
+      <h2 class="mt-3">Libraries</h2>
+      <p>The Arrow project includes libraries that implement the memory specification in many languages. They enable you to use the Arrow format as an efficient means of sharing data across languages and processes. Libraries are available for <a href="{{ site.baseurl }}/docs/c_glib/">C</a>, <a href="{{ site.baseurl }}/docs/cpp/">C++</a>, <a href="https://github.com/apache/arrow/blob/master/csharp/README.md">C#</a>, <a href="https://godoc.org/github.com/apache/arrow/go/arrow">Go</a>, <a href="{{ site.baseurl }}/docs/java/">Java</a>, <a href="{{ site.baseurl }}/docs/js/">JavaScript</a>, <a href="https://github.com/apache/arrow/blob/master/matlab/README.md">MATLAB</a>, <a href="{{ site.baseurl }}/docs/python/">Python</a>, <a href="{{ site.baseurl }}/docs/r/">R</a>, <a href="https://github.com/apache/arrow/blob/master/ruby/README.md">Ruby</a>, and <a href="https://docs.rs/crate/arrow/">Rust</a>.
       </p>
+      See <a href="{{ site.baseurl }}/install/">how to install</a> and <a href="{{ site.baseurl }}/getting_started/">get started</a>.
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Standard</h2>
-      <p>Apache Arrow is backed by key developers of 13 major open source projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, Kudu, Pandas, Parquet, Phoenix, Spark, and Storm making it the de-facto standard for columnar in-memory analytics.</p>
-      <p>Learn more about projects that are <a href="{{ site.baseurl }}/powered_by/">Powered By Apache Arrow</a></p>
+      <h2 class="mt-3">Applications</h2>

Review comment:
       Ecosystem?

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only backwards-compatible changes, such as additional data types. We do not yet recommend the Arrow file format for long-term disk persistence of data; that said, it is perfectly acceptable to write Arrow memory to disk for purposes of memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing applications now, and choose a suitable file format for disk storage if necessary. The Arrow libraries include adapters for several file formats, including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl }}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to [private@arrow.apache.org](mailto:private@arrow.apache.org).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed for in-memory use, but you can put it on disk and then memory-map later. Arrow and Parquet are intended to be compatible with each other and used together in applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet file requires decompressing and decoding its contents into some kind of in-memory data structure. It is designed to be space/IO-efficient at the expensive CPU utilization for decoding. It does not provide any data structures for in-memory computing. Parquet is a streaming format which must be decoded from start-to-end; while some "index page" facilities have been added to the storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file format".
 
-Arrow on the other hand is first and foremost a library providing columnar data structures for *in-memory computing*. When you read a Parquet file, you can decompress and decode the data *into* Arrow columnar data structures so that you can then perform analytics in-memory on the decoded data. The Arrow columnar format has some nice properties: random access is O(1) and each value cell is next to the previous and following one in memory, so it's efficient to iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.

Review comment:
       +1

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->

Review comment:
       Traditionally, data processing engine developers have created custom data structures to represent datasets in-memory while they are being processed. Given the "custom" nature of these data structures, they must also develop serialization interfaces to convert between these data structures and different file formats, network wire protocols, database clients, and other data transport interface. The net result of this is an incredible amount of waste both in developer time and in CPU cycles spend serializing data from one format to another.
   
   Therefore, the rationale for Arrow's in-memory columnar data format is to provide an out-of-the-box solution to several interrelated problems:
   
   * A general purpose tabular data representation that is highly efficient to process on modern hardware while also being suitable for a wide spectrum of use cases. We believe that fewer and fewer systems will create their own data structures and simply use Arrow.
   * Supports both random access and streaming / scan-based workloads.
   * A standardized memory format facilitates reuse of libraries of algorithms. When custom in-memory data formats are used, common algorithms must often be rewritten to target those custom data formats.
   * Systems that both use or support Arrow can transfer data between them at little-to-no cost. This results in a radical reduction in the amount of serialization overhead in analytical workloads that can often represent 80-90% of computing costs. 
   * The language-agnostic design of the Arrow format enables systems written in different programming languages (even running on the JVM) to communicate datasets without serialization overhead. For example, a Java application can call a C or C++ algorithm on data that originated in the JVM.  
   
   ... probably some other stuff can be added here

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?

Review comment:
       "Apache Arrow"

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only backwards-compatible changes, such as additional data types. We do not yet recommend the Arrow file format for long-term disk persistence of data; that said, it is perfectly acceptable to write Arrow memory to disk for purposes of memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing applications now, and choose a suitable file format for disk storage if necessary. The Arrow libraries include adapters for several file formats, including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl }}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to [private@arrow.apache.org](mailto:private@arrow.apache.org).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed for in-memory use, but you can put it on disk and then memory-map later. Arrow and Parquet are intended to be compatible with each other and used together in applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet file requires decompressing and decoding its contents into some kind of in-memory data structure. It is designed to be space/IO-efficient at the expensive CPU utilization for decoding. It does not provide any data structures for in-memory computing. Parquet is a streaming format which must be decoded from start-to-end; while some "index page" facilities have been added to the storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file format".
 
-Arrow on the other hand is first and foremost a library providing columnar data structures for *in-memory computing*. When you read a Parquet file, you can decompress and decode the data *into* Arrow columnar data structures so that you can then perform analytics in-memory on the decoded data. The Arrow columnar format has some nice properties: random access is O(1) and each value cell is next to the previous and following one in memory, so it's efficient to iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.
 
-What about "Arrow files" then? Apache Arrow defines a binary "serialization" protocol for arranging a collection of Arrow columnar arrays (called a "record batch") that can be used for messaging and interprocess communication. You can put the protocol anywhere, including on disk, which can later be memory-mapped or read into memory and sent elsewhere.
+Some things to keep in mind when comparing the Arrow IPC file format and the
+Parquet format:
 
-This Arrow protocol is designed so that you can "map" a blob of Arrow data without doing any deserialization, so performing analytics on Arrow protocol data on disk can use memory-mapping and pay effectively zero cost. The protocol is used for many other things as well, such as streaming data between Spark SQL and Python for running pandas functions against chunks of Spark SQL data (these are called "pandas udfs").
+* Parquet is safe for long-term storage and archival purposes, meaning if
+  you write a file today, you can expect that any system that says they can
+  "read Parquet" will be able to read the file in 5 years or 10 years.
+  We are not yet making this assertion about long-term stability of the Arrow
+  format.
 
-In some applications, Parquet and Arrow can be used interchangeably for on-disk data serialization. Some things to keep in mind:
+* Reading Parquet files generally requires expensive decoding, while reading
+  Arrow IPC files is just a matter of transferring raw bytes from the storage
+  hardware.
 
-* Parquet is intended for "archival" purposes, meaning if you write a file today, we expect that any system that says they can "read Parquet" will be able to read the file in 5 years or 7 years. We are not yet making this assertion about long-term stability of the Arrow format.
-* Parquet is generally a lot more expensive to read because it must be decoded into some other data structure. Arrow protocol data can simply be memory-mapped.
-* Parquet files are often much smaller than Arrow-protocol-on-disk because of the data encoding schemes that Parquet uses. If your disk storage or network is slow, Parquet may be a better choice.
+* Parquet files are often much smaller than Arrow IPC files because of the
+  elaborate encoding schemes that Parquet uses. If your disk storage or network
+  is slow, Parquet may be a better choice even for short-term storage or caching.
+
+### What about the "Feather" file format?
+
+The Feather v1 format started as a separate specification, but the Feather v2
+format is just another, easier to remember name for the Arrow IPC file format.
 
 ### How does Arrow relate to Flatbuffers?
 
-Flatbuffers is a domain-agnostic low-level building block for binary data formats. It cannot be used directly for data analysis tasks without a lot of manual scaffolding. Arrow is a data layer aimed directly at the needs of data analysis, providing elaborate data types (including extensible logical types), built-in support for "null" values (a.k.a "N/A"), and an expanding toolbox of I/O and computing facilities.
+Flatbuffers is a low-level building block for binary data serialization.
+It is not adapted to the representation of large, structured, homogenous
+data, and does not sit at the right abstraction layer for data analysis tasks.
+
+Arrow is a data layer aimed directly at the needs of data analysis, providing
+elaborate data types (including extensible logical types), built-in support
+for "null" values (representing missing data), and an expanding toolbox of I/O
+and computing facilities.
 
-The Arrow file format does use Flatbuffers under the hood to facilitate low-level metadata serialization. However, Arrow data has much richer semantics than Flatbuffers data.
+The Arrow file format does use Flatbuffers under the hood to facilitate low-level
+metadata serialization, but the Arrow data format uses its own representation

Review comment:
       maybe "to serialize schemas and other metadata needed to implement the Arrow binary IPC protocol"

##########
File path: getting_started.md
##########
@@ -0,0 +1,74 @@
+---
+layout: default
+title: Getting started
+description: Links to user guides to help you start using Arrow
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Getting started
+
+This page collects resources and guides for using Arrow in all of the project's languages.
+For reference on official release packages, see the
+[install page]({{ site.baseurl }}/install/).
+
+## C
+
+Glib

Review comment:
       TODO

##########
File path: index.html
##########
@@ -1,72 +1,62 @@
 ---
-layout: default
+layout: home
 ---
-<div class="jumbotron">
-    <h1>Apache Arrow</h1>
-    <p class="lead">A cross-language development platform for in-memory data</p>
-    <p>
-        <a class="btn btn-lg btn-success" style="white-space: normal;" href="mailto:dev-subscribe@arrow.apache.org" role="button">Join Mailing List</a>
-        <a class="btn btn-lg btn-primary" style="white-space: normal;" href="{{ site.baseurl }}/install/" role="button">Install ({{site.data.versions['current'].number}} Release - {{site.data.versions['current'].date}})</a>
-    </p>
-</div>
-<h5>
-  Interested in contributing?
-  <small class="text-muted">Join the <a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/"><strong>mailing list</strong></a> or check out the <a href="https://cwiki.apache.org/confluence/display/ARROW"><strong>developer wiki</strong></a>.</small>
-</h5>
-<h5>
-  <a href="{{ site.baseurl }}/blog/"><strong>See Latest News</strong></a>
-</h5>
-<p>
-  {{ site.description }}
-</p>
-<hr />
+<h1>What is Arrow?</h1>
 <div class="row">
   <div class="col-lg-4">
-      <h2 class="mt-3">Fast</h2>
-      <p>Apache Arrow&#8482; enables execution engines to take advantage of the latest SIMD (Single instruction, multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing. Columnar layout is optimized for data locality for better performance on modern hardware like CPUs and GPUs.</p>
-      <p>The Arrow memory format supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <h2 class="mt-3">Format</h2>
+      <p>Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <p><a href="{{ site.baseurl }}/overview/">Learn more</a> about the design or
+        <a href="{{ site.baseurl }}/docs/format/Columnar.html">read the specification</a>.</p>
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Flexible</h2>
-      <p>Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust implementations are in progress and more languages are welcome.
+      <h2 class="mt-3">Libraries</h2>
+      <p>The Arrow project includes libraries that implement the memory specification in many languages. They enable you to use the Arrow format as an efficient means of sharing data across languages and processes. Libraries are available for <a href="{{ site.baseurl }}/docs/c_glib/">C</a>, <a href="{{ site.baseurl }}/docs/cpp/">C++</a>, <a href="https://github.com/apache/arrow/blob/master/csharp/README.md">C#</a>, <a href="https://godoc.org/github.com/apache/arrow/go/arrow">Go</a>, <a href="{{ site.baseurl }}/docs/java/">Java</a>, <a href="{{ site.baseurl }}/docs/js/">JavaScript</a>, <a href="https://github.com/apache/arrow/blob/master/matlab/README.md">MATLAB</a>, <a href="{{ site.baseurl }}/docs/python/">Python</a>, <a href="{{ site.baseurl }}/docs/r/">R</a>, <a href="https://github.com/apache/arrow/blob/master/ruby/README.md">Ruby</a>, and <a href="https://docs.rs/crate/arrow/">Rust</a>.
       </p>
+      See <a href="{{ site.baseurl }}/install/">how to install</a> and <a href="{{ site.baseurl }}/getting_started/">get started</a>.
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Standard</h2>
-      <p>Apache Arrow is backed by key developers of 13 major open source projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, Kudu, Pandas, Parquet, Phoenix, Spark, and Storm making it the de-facto standard for columnar in-memory analytics.</p>
-      <p>Learn more about projects that are <a href="{{ site.baseurl }}/powered_by/">Powered By Apache Arrow</a></p>
+      <h2 class="mt-3">Applications</h2>
+      <p>Arrow libraries provide a foundation for developers to build fast analytics applications. <a href="{{ site.baseurl }}/powered_by/">Many popular projects</a> use Arrow to ship columnar data efficiently or as the basis for analytic engines.
+      <p>The libraries also include built-in features for working with data directly, including Parquet file reading and querying large datasets. See more Arrow <a href="{{ site.baseurl }}/use_cases/">use cases</a>.</p>

Review comment:
       I would say to condense the 2nd and 3rd points here and change this 3rd one to be about the ecosystem/community

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only backwards-compatible changes, such as additional data types. We do not yet recommend the Arrow file format for long-term disk persistence of data; that said, it is perfectly acceptable to write Arrow memory to disk for purposes of memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing applications now, and choose a suitable file format for disk storage if necessary. The Arrow libraries include adapters for several file formats, including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl }}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to [private@arrow.apache.org](mailto:private@arrow.apache.org).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed for in-memory use, but you can put it on disk and then memory-map later. Arrow and Parquet are intended to be compatible with each other and used together in applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet file requires decompressing and decoding its contents into some kind of in-memory data structure. It is designed to be space/IO-efficient at the expensive CPU utilization for decoding. It does not provide any data structures for in-memory computing. Parquet is a streaming format which must be decoded from start-to-end; while some "index page" facilities have been added to the storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file format".
 
-Arrow on the other hand is first and foremost a library providing columnar data structures for *in-memory computing*. When you read a Parquet file, you can decompress and decode the data *into* Arrow columnar data structures so that you can then perform analytics in-memory on the decoded data. The Arrow columnar format has some nice properties: random access is O(1) and each value cell is next to the previous and following one in memory, so it's efficient to iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.
 
-What about "Arrow files" then? Apache Arrow defines a binary "serialization" protocol for arranging a collection of Arrow columnar arrays (called a "record batch") that can be used for messaging and interprocess communication. You can put the protocol anywhere, including on disk, which can later be memory-mapped or read into memory and sent elsewhere.
+Some things to keep in mind when comparing the Arrow IPC file format and the
+Parquet format:
 
-This Arrow protocol is designed so that you can "map" a blob of Arrow data without doing any deserialization, so performing analytics on Arrow protocol data on disk can use memory-mapping and pay effectively zero cost. The protocol is used for many other things as well, such as streaming data between Spark SQL and Python for running pandas functions against chunks of Spark SQL data (these are called "pandas udfs").
+* Parquet is safe for long-term storage and archival purposes, meaning if
+  you write a file today, you can expect that any system that says they can
+  "read Parquet" will be able to read the file in 5 years or 10 years.
+  We are not yet making this assertion about long-term stability of the Arrow
+  format.
 
-In some applications, Parquet and Arrow can be used interchangeably for on-disk data serialization. Some things to keep in mind:
+* Reading Parquet files generally requires expensive decoding, while reading
+  Arrow IPC files is just a matter of transferring raw bytes from the storage
+  hardware.
 
-* Parquet is intended for "archival" purposes, meaning if you write a file today, we expect that any system that says they can "read Parquet" will be able to read the file in 5 years or 7 years. We are not yet making this assertion about long-term stability of the Arrow format.
-* Parquet is generally a lot more expensive to read because it must be decoded into some other data structure. Arrow protocol data can simply be memory-mapped.
-* Parquet files are often much smaller than Arrow-protocol-on-disk because of the data encoding schemes that Parquet uses. If your disk storage or network is slow, Parquet may be a better choice.
+* Parquet files are often much smaller than Arrow IPC files because of the
+  elaborate encoding schemes that Parquet uses. If your disk storage or network
+  is slow, Parquet may be a better choice even for short-term storage or caching.
+
+### What about the "Feather" file format?
+
+The Feather v1 format started as a separate specification, but the Feather v2
+format is just another, easier to remember name for the Arrow IPC file format.

Review comment:
       "started as a separate specification" -> "was a simplified custom container for writing a subset of the Arrow format to disk prior to the development of the Arrow IPC file format. "Feather version 2" is now exactly the Arrow IPC file format and we have retained the "Feather" name and APIs for backwards compatibility."

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+We refer you to the [implementation matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
 
-The Arrow in-memory format is considered stable, and we intend to make only backwards-compatible changes, such as additional data types. We do not yet recommend the Arrow file format for long-term disk persistence of data; that said, it is perfectly acceptable to write Arrow memory to disk for purposes of memory mapping and caching.
+## Getting involved
 
-We encourage people to start building Arrow-based in-memory computing applications now, and choose a suitable file format for disk storage if necessary. The Arrow libraries include adapters for several file formats, including Parquet, ORC, CSV, and JSON.
+### I have some questions. How can I get help?
+
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl }}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to [private@arrow.apache.org](mailto:private@arrow.apache.org).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
+
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on but must be decoded in
+large chunks.
+
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,
+when using dictionary encoding) but laid out in natural format for the CPU,
+so that data can be accessed at arbitrary places at full speed.
+
+Therefore, Arrow and Parquet are not competitors: they complement each other
+and are commonly used together in applications.  Storing your data on disk
+using Parquet, and reading it into memory in the Arrow format, will allow
+you to make the most of your computing hardware.
 
-In short, Parquet files are designed for disk storage, while Arrow is designed for in-memory use, but you can put it on disk and then memory-map later. Arrow and Parquet are intended to be compatible with each other and used together in applications.
+### What about "Arrow files" then?
 
-Parquet is a columnar file format for data serialization. Reading a Parquet file requires decompressing and decoding its contents into some kind of in-memory data structure. It is designed to be space/IO-efficient at the expensive CPU utilization for decoding. It does not provide any data structures for in-memory computing. Parquet is a streaming format which must be decoded from start-to-end; while some "index page" facilities have been added to the storage format recently, random access operations are generally costly.
+Apache Arrow defines an inter-process communication (IPC) mechanism to
+transfer a collection of Arrow columnar arrays (called a "record batch").
+It can be used synchronously between processes using the Arrow "stream format",
+or asynchronously by first persisting data on storage using the Arrow "file format".
 
-Arrow on the other hand is first and foremost a library providing columnar data structures for *in-memory computing*. When you read a Parquet file, you can decompress and decode the data *into* Arrow columnar data structures so that you can then perform analytics in-memory on the decoded data. The Arrow columnar format has some nice properties: random access is O(1) and each value cell is next to the previous and following one in memory, so it's efficient to iterate over.
+The Arrow IPC mechanism is based on the Arrow in-memory format, such that
+there is no translation necessary between the on-disk representation and
+the in-memory representation.  Therefore, performing analytics on an Arrow
+IPC file can use memory-mapping and pay effectively zero cost.
 
-What about "Arrow files" then? Apache Arrow defines a binary "serialization" protocol for arranging a collection of Arrow columnar arrays (called a "record batch") that can be used for messaging and interprocess communication. You can put the protocol anywhere, including on disk, which can later be memory-mapped or read into memory and sent elsewhere.
+Some things to keep in mind when comparing the Arrow IPC file format and the
+Parquet format:
 
-This Arrow protocol is designed so that you can "map" a blob of Arrow data without doing any deserialization, so performing analytics on Arrow protocol data on disk can use memory-mapping and pay effectively zero cost. The protocol is used for many other things as well, such as streaming data between Spark SQL and Python for running pandas functions against chunks of Spark SQL data (these are called "pandas udfs").
+* Parquet is safe for long-term storage and archival purposes, meaning if
+  you write a file today, you can expect that any system that says they can
+  "read Parquet" will be able to read the file in 5 years or 10 years.
+  We are not yet making this assertion about long-term stability of the Arrow
+  format.
 
-In some applications, Parquet and Arrow can be used interchangeably for on-disk data serialization. Some things to keep in mind:
+* Reading Parquet files generally requires expensive decoding, while reading
+  Arrow IPC files is just a matter of transferring raw bytes from the storage
+  hardware.
 
-* Parquet is intended for "archival" purposes, meaning if you write a file today, we expect that any system that says they can "read Parquet" will be able to read the file in 5 years or 7 years. We are not yet making this assertion about long-term stability of the Arrow format.
-* Parquet is generally a lot more expensive to read because it must be decoded into some other data structure. Arrow protocol data can simply be memory-mapped.
-* Parquet files are often much smaller than Arrow-protocol-on-disk because of the data encoding schemes that Parquet uses. If your disk storage or network is slow, Parquet may be a better choice.
+* Parquet files are often much smaller than Arrow IPC files because of the
+  elaborate encoding schemes that Parquet uses. If your disk storage or network

Review comment:
       "elaborate" seems a bit emotionally charged to me, let's use something more neutral and precise
   
   "elaborate encoding schemes" -> "columnar data compression strategies"

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only

Review comment:
       Maybe "columnar format and protocol"

##########
File path: faq.md
##########
@@ -24,32 +24,155 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Arrow Flight), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?

Review comment:
       "Why define a standard for columnar in-memory?"
   
   There can't be a new standard if there isn't an old one. There never was




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] robert-wagner commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
robert-wagner commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r438788399



##########
File path: community.md
##########
@@ -0,0 +1,66 @@
+---
+layout: default
+title: Apache Arrow Community
+description: Links and resources for participating in Apache Arrow
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Apache Arrow Community
+
+We welcome participation from everyone and encourage you to join us, ask questions, and get involved.
+
+All participation in the Apache Arrow project is governed by the Apache Software Foundation's [code of conduct](https://www.apache.org/foundation/policies/conduct.html).
+
+## Questions?
+
+### Mailing lists
+
+These arrow.apache.org mailing lists are for project discussion:
+
+<ul>
+  <li> <code>user@</code> is for questions on using Apache Arrow libraries {% include mailing_list_links.html list="user" %} </li>
+  <li> <code>dev@</code> is for discussions about contributing to the project development {% include mailing_list_links.html list="dev" %} </li>
+</ul>
+
+When emailing one of the lists, you may want to prefix the subject line with one or more tags, like `[C++] why did this segfault?`, `[Python] trouble with wheels`, etc., so that the appropriate people in the community notice the message.
+
+In addition, these lists log several activity streams:
+
+<ul>
+  <li> <code>issues@</code> for JIRA activity {% include mailing_list_links.html list="issues" %} </li>

Review comment:
       In light of the upcoming split between issues and Jira (https://lists.apache.org/thread.html/r9ef1b83fc7261670c1a80f26c9e13f7999c01202793dae79e43a150b%40%3Cdev.arrow.apache.org%3E) should this be updated?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] bkietz commented on a change in pull request #63: ARROW-9335: [Website] Update website for 1.0

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r457466216



##########
File path: .github/workflows/deploy.yml
##########
@@ -45,7 +45,7 @@ jobs:
       - name: Configure for GitHub Pages on master
         run: |
           owner=$(jq --raw-output .repository.owner.login ${GITHUB_EVENT_PATH})
-          repository=$(jq .repository.name ${GITHUB_EVENT_PATH})
+          repository=$(jq --raw-output .repository.name ${GITHUB_EVENT_PATH})

Review comment:
       can this not be replaced as above with
   ```suggestion
             repository=${{ github.repository.name }}
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r445207165



##########
File path: index.html
##########
@@ -1,72 +1,62 @@
 ---
-layout: default
+layout: home
 ---
-<div class="jumbotron">
-    <h1>Apache Arrow</h1>
-    <p class="lead">A cross-language development platform for in-memory data</p>
-    <p>
-        <a class="btn btn-lg btn-success" style="white-space: normal;" href="mailto:dev-subscribe@arrow.apache.org" role="button">Join Mailing List</a>
-        <a class="btn btn-lg btn-primary" style="white-space: normal;" href="{{ site.baseurl }}/install/" role="button">Install ({{site.data.versions['current'].number}} Release - {{site.data.versions['current'].date}})</a>
-    </p>
-</div>
-<h5>
-  Interested in contributing?
-  <small class="text-muted">Join the <a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/"><strong>mailing list</strong></a> or check out the <a href="https://cwiki.apache.org/confluence/display/ARROW"><strong>developer wiki</strong></a>.</small>
-</h5>
-<h5>
-  <a href="{{ site.baseurl }}/blog/"><strong>See Latest News</strong></a>
-</h5>
-<p>
-  {{ site.description }}
-</p>
-<hr />
+<h1>What is Arrow?</h1>
 <div class="row">
   <div class="col-lg-4">
-      <h2 class="mt-3">Fast</h2>
-      <p>Apache Arrow&#8482; enables execution engines to take advantage of the latest SIMD (Single instruction, multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing. Columnar layout is optimized for data locality for better performance on modern hardware like CPUs and GPUs.</p>
-      <p>The Arrow memory format supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <h2 class="mt-3">Format</h2>
+      <p>Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
+      <p><a href="{{ site.baseurl }}/overview/">Learn more</a> about the design or
+        <a href="{{ site.baseurl }}/docs/format/Columnar.html">read the specification</a>.</p>
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Flexible</h2>
-      <p>Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust implementations are in progress and more languages are welcome.
+      <h2 class="mt-3">Libraries</h2>
+      <p>The Arrow project includes libraries that implement the memory specification in many languages. They enable you to use the Arrow format as an efficient means of sharing data across languages and processes. Libraries are available for <a href="{{ site.baseurl }}/docs/c_glib/">C</a>, <a href="{{ site.baseurl }}/docs/cpp/">C++</a>, <a href="https://github.com/apache/arrow/blob/master/csharp/README.md">C#</a>, <a href="https://godoc.org/github.com/apache/arrow/go/arrow">Go</a>, <a href="{{ site.baseurl }}/docs/java/">Java</a>, <a href="{{ site.baseurl }}/docs/js/">JavaScript</a>, <a href="https://github.com/apache/arrow/blob/master/matlab/README.md">MATLAB</a>, <a href="{{ site.baseurl }}/docs/python/">Python</a>, <a href="{{ site.baseurl }}/docs/r/">R</a>, <a href="https://github.com/apache/arrow/blob/master/ruby/README.md">Ruby</a>, and <a href="https://docs.rs/crate/arrow/">Rust</a>.
       </p>
+      See <a href="{{ site.baseurl }}/install/">how to install</a> and <a href="{{ site.baseurl }}/getting_started/">get started</a>.
   </div>
   <div class="col-lg-4">
-      <h2 class="mt-3">Standard</h2>
-      <p>Apache Arrow is backed by key developers of 13 major open source projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, Kudu, Pandas, Parquet, Phoenix, Spark, and Storm making it the de-facto standard for columnar in-memory analytics.</p>
-      <p>Learn more about projects that are <a href="{{ site.baseurl }}/powered_by/">Powered By Apache Arrow</a></p>
+      <h2 class="mt-3">Applications</h2>
+      <p>Arrow libraries provide a foundation for developers to build fast analytics applications. <a href="{{ site.baseurl }}/powered_by/">Many popular projects</a> use Arrow to ship columnar data efficiently or as the basis for analytic engines.
+      <p>The libraries also include built-in features for working with data directly, including Parquet file reading and querying large datasets. See more Arrow <a href="{{ site.baseurl }}/use_cases/">use cases</a>.</p>

Review comment:
       In a sense, you're suggesting to pull up the "Community" blurb in the "Why Arrow" section? If so we could move the rest of the "Why Arrow" to the "Overview" page. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] pitrou commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r438886294



##########
File path: _layouts/home.html
##########
@@ -0,0 +1,21 @@
+{% include top.html %}
+
+<body class="wrap">
+  <header>
+    {% include header.html %}
+  </header>
+  <div class="big-arrow-bg">
+    <div class="container p-lg-4 centered">
+      <img src="{{ site.baseurl }}/img/arrow-inverse.png" style="max-width: 80%;"/>

Review comment:
       I think the logo is currently too big. It's taking much of the window's real estate. Reduce it to e.g. 40%?

##########
File path: community.md
##########
@@ -0,0 +1,66 @@
+---
+layout: default
+title: Apache Arrow Community
+description: Links and resources for participating in Apache Arrow
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# Apache Arrow Community
+
+We welcome participation from everyone and encourage you to join us, ask questions, and get involved.
+
+All participation in the Apache Arrow project is governed by the Apache Software Foundation's [code of conduct](https://www.apache.org/foundation/policies/conduct.html).
+
+## Questions?
+
+### Mailing lists
+
+These arrow.apache.org mailing lists are for project discussion:
+
+<ul>
+  <li> <code>user@</code> is for questions on using Apache Arrow libraries {% include mailing_list_links.html list="user" %} </li>
+  <li> <code>dev@</code> is for discussions about contributing to the project development {% include mailing_list_links.html list="dev" %} </li>
+</ul>
+
+When emailing one of the lists, you may want to prefix the subject line with one or more tags, like `[C++] why did this segfault?`, `[Python] trouble with wheels`, etc., so that the appropriate people in the community notice the message.
+
+In addition, these lists log several activity streams:
+
+<ul>
+  <li> <code>issues@</code> for JIRA activity {% include mailing_list_links.html list="issues" %} </li>
+  <li> <code>github@</code> for all activity on the <a href="https://github.com/apache/arrow">apache/arrow</a> and <a href="https://github.com/apache/arrow-site">apache/arrow-site</a> repositories {% include mailing_list_links.html list="github" %} </li>
+  <li> <code>commits@</code> for just commits to those repositories (typically to <code>master</code> only) {% include mailing_list_links.html list="commits" %} </li>
+</ul>
+
+### Stack Overflow
+
+For questions on how to use Arrow libraries, you may want to use the Stack Overflow tag [apache-arrow](https://stackoverflow.com/questions/tagged/apache-arrow) in addition to the programming language. Some languages and subprojects may have their own tags (for example, [pyarrow](https://stackoverflow.com/questions/tagged/pyarrow)).
+
+### GitHub issues
+
+We support GitHub issues as a lightweight way to ask questions and engage with
+the Arrow developer community.
+That said, we use [JIRA](https://issues.apache.org/jira/browse/ARROW) for maintaining a queue of development work and as the public record for work on the project, and we use the mailing lists for development discussions, so to keep things in a single place, we prefer not to have lengthy discussions on GitHub issues.
+
+If you know your question is actually a bug report or feature request, we encourage you to go straight to JIRA. If you're not sure, feel free to make a GitHub issue to ask your question and we can triage/redirect your question there.
+
+## Contributing
+
+As mentioned above, we use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue tracker and GitHub for source control. See the [contribution guidelines]({{ site.baseurl }}/docs/developers/contributing.html) for more.

Review comment:
       Link to the Github repo page?

##########
File path: format.html
##########
@@ -0,0 +1,55 @@
+---
+layout: default
+title: Format
+description: Arrow Format
+---
+
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+
+<div class="row">
+  <div class="col-md-6">
+    <h2>Performance Advantage of Columnar In-Memory</h2>

Review comment:
       Why does some text have to be written out as HTML, rather than Markdown?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] rymurr commented on a change in pull request #63: ARROW-9335: [Website] Update website for 1.0

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r456546163



##########
File path: _posts/2020-07-16-1.0.0-release.md
##########
@@ -0,0 +1,90 @@
+---
+layout: post
+title: "Apache Arrow 1.0.0 Release"
+date: "2020-07-16 00:00:00 -0600"
+author: pmc
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Apache Arrow team is pleased to announce the 1.0.0 release. This covers
+over 2 months of development work and includes [**XX resolved issues**][1]
+from [**XX distinct contributors**][2]. See the Install Page to learn how to
+get the libraries for your platform.
+
+The release notes below are not exhaustive and only expose selected highlights
+of the release. Many other bugfixes and improvements have been made: we refer
+you to the [complete changelog][3].
+
+## 1.0 Format Release
+
+The 1.0 major release indicates that the Arrow columnar format is declared
+stable, with [forward and backward compatibility guarantees][5].
+
+Integration testing, link to implementation matrix
+
+Format changes: unions, unsigned int dictionary indices, decimal bit width, feature enum.
+
+## Community
+
+Since the last release, we have added two new committers:
+
+* Liya Fan
+* Ji Liu
+
+Thank you for all your contributions!
+
+<!-- Acknowledge and link to any new committers and PMC members since the last release. See previous release announcements for examples. -->
+
+## Arrow Flight RPC notes
+
+## C++ notes
+
+## C# notes
+
+## Go notes
+
+## Java notes
+

Review comment:
       done!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on a change in pull request #63: Revamp website for 1.0 release

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r438953875



##########
File path: faq.md
##########
@@ -24,32 +24,160 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Flight RPC), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable

Review comment:
       Why not?

##########
File path: faq.md
##########
@@ -24,32 +24,160 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Flight RPC), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+Some implementations of Arrow are more complete and more stable than others.
+We refer you to the [implementation matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).
+
+## Getting started
+
+### Where can I get Arrow libraries?
+
+Arrow libraries for many languages are available through the usual package
+managers. See the [install]({{ site.baseurl }}/install/) page for specifics.
+
+## Getting involved
 
-The Arrow in-memory format is considered stable, and we intend to make only backwards-compatible changes, such as additional data types. We do not yet recommend the Arrow file format for long-term disk persistence of data; that said, it is perfectly acceptable to write Arrow memory to disk for purposes of memory mapping and caching.
+### I have some questions. How can I get help?
 
-We encourage people to start building Arrow-based in-memory computing applications now, and choose a suitable file format for disk storage if necessary. The Arrow libraries include adapters for several file formats, including Parquet, ORC, CSV, and JSON.
+The [Arrow mailing lists]({{ site.baseurl }}/community/) are the best place
+to ask questions. Don't be shy--we're here to help.
+
+### I tried to use Arrow and it didn't work. Can you fix it?
+
+Hopefully! Please make a detailed bug report--that's a valuable contribution
+to the project itself.
+See the [contribution guidelines]({{ site.baseurl }}/docs/developers/contributing.html)
+for how to make a report.
+
+### Arrow looks great and I'd totally use it if it only did X. When will it be done?
+
+We use [JIRA](https://issues.apache.org/jira/browse/ARROW) for our issue tracker.
+Search for an issue that matches your need. If you find one, feel free to
+comment on it and describe your use case--that will help whoever picks up
+the task. If you don't find one, make it.
+
+Ultimately, Arrow is software written by and for the community. If you don't
+see someone else in the community working on your issue, the best way to get
+it done is to pitch in yourself. We're more than willing to help you contribute
+successfully to the project.
+
+### How can I report a security vulnerability?
+
+Please send an email to [private@arrow.apache.org](mailto:private@arrow.apache.org).
+See the [security]({{ site.baseurl }}/security/) page for more.
+
+## Relation to other projects
 
 ### What is the difference between Apache Arrow and Apache Parquet?
+<!-- Revise this -->
 
-In short, Parquet files are designed for disk storage, while Arrow is designed for in-memory use, but you can put it on disk and then memory-map later. Arrow and Parquet are intended to be compatible with each other and used together in applications.
+Parquet files are designed for disk storage, while Arrow is designed for in-memory use,
+though you can put it on disk and then memory-map later. Arrow and Parquet are
+intended to be compatible with each other and used together in applications.
 
-Parquet is a columnar file format for data serialization. Reading a Parquet file requires decompressing and decoding its contents into some kind of in-memory data structure. It is designed to be space/IO-efficient at the expensive CPU utilization for decoding. It does not provide any data structures for in-memory computing. Parquet is a streaming format which must be decoded from start-to-end; while some "index page" facilities have been added to the storage format recently, random access operations are generally costly.
+Parquet is a storage format designed for maximum space efficiency, using
+advanced compression and encoding techniques.  It is ideal when wanting to
+minimize disk usage while storing gigabytes of data, or perhaps more.
+This efficiency comes at the cost of relatively expensive reading into memory,
+as Parquet data cannot be directly operated on, and it must be decoded in
+large chunks.
 
-Arrow on the other hand is first and foremost a library providing columnar data structures for *in-memory computing*. When you read a Parquet file, you can decompress and decode the data *into* Arrow columnar data structures so that you can then perform analytics in-memory on the decoded data. The Arrow columnar format has some nice properties: random access is O(1) and each value cell is next to the previous and following one in memory, so it's efficient to iterate over.
+Conversely, Arrow is an in-memory format meant for direct and efficient use
+for computational purposes.  Arrow data is not compressed (or only lightly so,

Review comment:
       IPC files do support lz4/zstd compression, at least in the C++ implementation

##########
File path: faq.md
##########
@@ -24,32 +24,160 @@ limitations under the License.
 
 # Frequently Asked Questions
 
+## General
+
+### What *is* Arrow?
+
+Arrow is an open standard for how to represent columnar data in memory, along
+with libraries in many languages that implement that standard.  The Arrow format
+allows different programs and runtimes, perhaps written in different languages,
+to share data efficiently using a set of rich data types (included nested
+and user-defined data types).  The Arrow libraries make it easy to write such
+programs, by sparing the programmer from implementing low-level details of the
+Arrow format.
+
+Arrow additionally defines a streaming format and a file format for
+inter-process communication (IPC), based on the in-memory format.  It also
+defines a generic client-server RPC mechanism (Flight RPC), based on the
+IPC format, and implemented on top of the gRPC framework.  <!-- TODO links -->
+
+### Why create a new standard?
+
+<!-- Fill this in -->
+
+## Project status
+
 ### How stable is the Arrow format? Is it safe to use in my application?
+<!-- Revise this -->
+
+The Arrow *in-memory format* is considered stable, and we intend to make only
+backwards-compatible changes, such as additional data types.  It is used by
+many applications already, and you can trust that compatibility will not be
+broken.
+
+The Arrow *file format* (based on the Arrow IPC mechanism) is not recommended
+for long-term disk persistence of data; that said, it is perfectly acceptable
+to write Arrow memory to disk for purposes of memory mapping and caching.
+
+We encourage people to start building Arrow-based in-memory computing
+applications now, and choose a suitable file format for disk storage
+if necessary. The Arrow libraries include adapters for several file formats,
+including Parquet, ORC, CSV, and JSON.
+
+### How stable are the Arrow libraries?
+
+Some implementations of Arrow are more complete and more stable than others.
+We refer you to the [implementation matrix](https://github.com/apache/arrow/blob/master/docs/source/status.rst).

Review comment:
       This should probably link to the published docs page (which, of course, doesn't exist today but will when this goes live).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-site] nealrichardson commented on a change in pull request #63: ARROW-9335: [Website] Update website for 1.0

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #63:
URL: https://github.com/apache/arrow-site/pull/63#discussion_r457663485



##########
File path: .github/workflows/deploy.yml
##########
@@ -45,7 +45,7 @@ jobs:
       - name: Configure for GitHub Pages on master
         run: |
           owner=$(jq --raw-output .repository.owner.login ${GITHUB_EVENT_PATH})
-          repository=$(jq .repository.name ${GITHUB_EVENT_PATH})
+          repository=$(jq --raw-output .repository.name ${GITHUB_EVENT_PATH})

Review comment:
       I did something like that before but Kou also changed it and I got a merge conflict, must've just kept his version 🤷  https://github.com/apache/arrow-site/pull/63/commits/270c3225050c575126d8de23e78f028c51f19bf9




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org