You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by gi...@apache.org on 2022/12/26 21:22:48 UTC

[arrow-site] branch asf-site updated: Updating built site (build e64c7b4b55e2925c0e036152085acca58732428c)

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/arrow-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 2fc56419a2e Updating built site (build e64c7b4b55e2925c0e036152085acca58732428c)
2fc56419a2e is described below

commit 2fc56419a2ee205cd9af535c3a851272052cf996
Author: Andrew Lamb <an...@nerdnetworks.org>
AuthorDate: Mon Dec 26 21:22:42 2022 +0000

    Updating built site (build e64c7b4b55e2925c0e036152085acca58732428c)
---
 .../index.html                                     | 840 +++++++++++++++++++++
 blog/index.html                                    |  15 +
 docs/c_glib/index.html                             |   4 +-
 feed.xml                                           | 683 ++++++++++++++---
 release/0.1.0.html                                 |   4 +-
 release/0.10.0.html                                |   4 +-
 release/0.11.0.html                                |   4 +-
 release/0.11.1.html                                |   4 +-
 release/0.12.0.html                                |   4 +-
 release/0.13.0.html                                |   4 +-
 release/0.14.0.html                                |   4 +-
 release/0.14.1.html                                |   4 +-
 release/0.15.0.html                                |   4 +-
 release/0.15.1.html                                |   4 +-
 release/0.16.0.html                                |   4 +-
 release/0.17.0.html                                |   4 +-
 release/0.17.1.html                                |   4 +-
 release/0.2.0.html                                 |   4 +-
 release/0.3.0.html                                 |   4 +-
 release/0.4.0.html                                 |   4 +-
 release/0.4.1.html                                 |   4 +-
 release/0.5.0.html                                 |   4 +-
 release/0.6.0.html                                 |   4 +-
 release/0.7.0.html                                 |   4 +-
 release/0.7.1.html                                 |   4 +-
 release/0.8.0.html                                 |   4 +-
 release/0.9.0.html                                 |   4 +-
 release/1.0.0.html                                 |   4 +-
 release/1.0.1.html                                 |   4 +-
 release/10.0.0.html                                |   4 +-
 release/10.0.1.html                                |   4 +-
 release/2.0.0.html                                 |   4 +-
 release/3.0.0.html                                 |   4 +-
 release/4.0.0.html                                 |   4 +-
 release/4.0.1.html                                 |   4 +-
 release/5.0.0.html                                 |   4 +-
 release/6.0.0.html                                 |   4 +-
 release/6.0.1.html                                 |   4 +-
 release/7.0.0.html                                 |   4 +-
 release/8.0.0.html                                 |   4 +-
 release/9.0.0.html                                 |   4 +-
 release/index.html                                 |   4 +-
 42 files changed, 1496 insertions(+), 198 deletions(-)

diff --git a/blog/2022/12/26/querying-parquet-with-millisecond-latency/index.html b/blog/2022/12/26/querying-parquet-with-millisecond-latency/index.html
new file mode 100644
index 00000000000..d9222d13f5a
--- /dev/null
+++ b/blog/2022/12/26/querying-parquet-with-millisecond-latency/index.html
@@ -0,0 +1,840 @@
+<!DOCTYPE html>
+<html lang="en-US">
+  <head>
+    <meta charset="UTF-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <!-- The above meta tags *must* come first in the head; any other head content must come *after* these tags -->
+    
+    <title>Querying Parquet with Millisecond Latency | Apache Arrow</title>
+    
+
+    <!-- Begin Jekyll SEO tag v2.8.0 -->
+<meta name="generator" content="Jekyll v4.2.0" />
+<meta property="og:title" content="Querying Parquet with Millisecond Latency" />
+<meta name="author" content="tustvold and alamb" />
+<meta property="og:locale" content="en_US" />
+<meta name="description" content="Querying Parquet with Millisecond Latency Note: this article was originally published on the InfluxData Blog. We believe that querying data in Apache Parquet files directly can achieve similar or better storage efficiency and query performance than most specialized file formats. While it requires significant engineering effort, the benefits of Parquet’s open format and broad ecosystem support make it the obvious choice for a wide class of data systems. I [...]
+<meta property="og:description" content="Querying Parquet with Millisecond Latency Note: this article was originally published on the InfluxData Blog. We believe that querying data in Apache Parquet files directly can achieve similar or better storage efficiency and query performance than most specialized file formats. While it requires significant engineering effort, the benefits of Parquet’s open format and broad ecosystem support make it the obvious choice for a wide class of data sys [...]
+<link rel="canonical" href="https://arrow.apache.org/blog/2022/12/26/querying-parquet-with-millisecond-latency/" />
+<meta property="og:url" content="https://arrow.apache.org/blog/2022/12/26/querying-parquet-with-millisecond-latency/" />
+<meta property="og:site_name" content="Apache Arrow" />
+<meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
+<meta property="og:type" content="article" />
+<meta property="article:published_time" content="2022-12-26T00:00:00-05:00" />
+<meta name="twitter:card" content="summary_large_image" />
+<meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
+<meta property="twitter:title" content="Querying Parquet with Millisecond Latency" />
+<meta name="twitter:site" content="@ApacheArrow" />
+<meta name="twitter:creator" content="@tustvold and alamb" />
+<script type="application/ld+json">
+{"@context":"https://schema.org","@type":"BlogPosting","author":{"@type":"Person","name":"tustvold and alamb"},"dateModified":"2022-12-26T00:00:00-05:00","datePublished":"2022-12-26T00:00:00-05:00","description":"Querying Parquet with Millisecond Latency Note: this article was originally published on the InfluxData Blog. We believe that querying data in Apache Parquet files directly can achieve similar or better storage efficiency and query performance than most specialized file formats. [...]
+<!-- End Jekyll SEO tag -->
+
+
+    <!-- favicons -->
+    <link rel="icon" type="image/png" sizes="16x16" href="/img/favicon-16x16.png" id="light1">
+    <link rel="icon" type="image/png" sizes="32x32" href="/img/favicon-32x32.png" id="light2">
+    <link rel="apple-touch-icon" type="image/png" sizes="180x180" href="/img/apple-touch-icon.png" id="light3">
+    <link rel="apple-touch-icon" type="image/png" sizes="120x120" href="/img/apple-touch-icon-120x120.png" id="light4">
+    <link rel="apple-touch-icon" type="image/png" sizes="76x76" href="/img/apple-touch-icon-76x76.png" id="light5">
+    <link rel="apple-touch-icon" type="image/png" sizes="60x60" href="/img/apple-touch-icon-60x60.png" id="light6">
+    <!-- dark mode favicons -->
+    <link rel="icon" type="image/png" sizes="16x16" href="/img/favicon-16x16-dark.png" id="dark1">
+    <link rel="icon" type="image/png" sizes="32x32" href="/img/favicon-32x32-dark.png" id="dark2">
+    <link rel="apple-touch-icon" type="image/png" sizes="180x180" href="/img/apple-touch-icon-dark.png" id="dark3">
+    <link rel="apple-touch-icon" type="image/png" sizes="120x120" href="/img/apple-touch-icon-120x120-dark.png" id="dark4">
+    <link rel="apple-touch-icon" type="image/png" sizes="76x76" href="/img/apple-touch-icon-76x76-dark.png" id="dark5">
+    <link rel="apple-touch-icon" type="image/png" sizes="60x60" href="/img/apple-touch-icon-60x60-dark.png" id="dark6">
+
+    <script>
+      // Switch to the dark-mode favicons if prefers-color-scheme: dark
+      function onUpdate() {
+        light1 = document.querySelector('link#light1');
+        light2 = document.querySelector('link#light2');
+        light3 = document.querySelector('link#light3');
+        light4 = document.querySelector('link#light4');
+        light5 = document.querySelector('link#light5');
+        light6 = document.querySelector('link#light6');
+
+        dark1 = document.querySelector('link#dark1');
+        dark2 = document.querySelector('link#dark2');
+        dark3 = document.querySelector('link#dark3');
+        dark4 = document.querySelector('link#dark4');
+        dark5 = document.querySelector('link#dark5');
+        dark6 = document.querySelector('link#dark6');
+
+        if (matcher.matches) {
+          light1.remove();
+          light2.remove();
+          light3.remove();
+          light4.remove();
+          light5.remove();
+          light6.remove();
+          document.head.append(dark1);
+          document.head.append(dark2);
+          document.head.append(dark3);
+          document.head.append(dark4);
+          document.head.append(dark5);
+          document.head.append(dark6);
+        } else {
+          dark1.remove();
+          dark2.remove();
+          dark3.remove();
+          dark4.remove();
+          dark5.remove();
+          dark6.remove();
+          document.head.append(light1);
+          document.head.append(light2);
+          document.head.append(light3);
+          document.head.append(light4);
+          document.head.append(light5);
+          document.head.append(light6);
+        }
+      }
+      matcher = window.matchMedia('(prefers-color-scheme: dark)');
+      matcher.addListener(onUpdate);
+      onUpdate();
+    </script>
+
+    <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
+
+    <link href="/css/main.css" rel="stylesheet">
+    <link href="/css/syntax.css" rel="stylesheet">
+    <script src="/javascript/main.js"></script>
+    
+    <!-- Matomo -->
+<script>
+  var _paq = window._paq = window._paq || [];
+  /* tracker methods like "setCustomDimension" should be called before "trackPageView" */
+  /* We explicitly disable cookie tracking to avoid privacy issues */
+  _paq.push(['disableCookies']);
+  _paq.push(['trackPageView']);
+  _paq.push(['enableLinkTracking']);
+  (function() {
+    var u="https://analytics.apache.org/";
+    _paq.push(['setTrackerUrl', u+'matomo.php']);
+    _paq.push(['setSiteId', '20']);
+    var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
+    g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s);
+  })();
+</script>
+<!-- End Matomo Code -->
+
+    
+  </head>
+
+
+<body class="wrap">
+  <header>
+    <nav class="navbar navbar-expand-md navbar-dark bg-dark">
+  
+  <a class="navbar-brand no-padding" href="/"><img src="/img/arrow-inverse-300px.png" height="40px"/></a>
+  
+   <button class="navbar-toggler ml-auto" type="button" data-toggle="collapse" data-target="#arrow-navbar" aria-controls="arrow-navbar" aria-expanded="false" aria-label="Toggle navigation">
+    <span class="navbar-toggler-icon"></span>
+  </button>
+
+    <!-- Collect the nav links, forms, and other content for toggling -->
+    <div class="collapse navbar-collapse justify-content-end" id="arrow-navbar">
+      <ul class="nav navbar-nav">
+        <li class="nav-item"><a class="nav-link" href="/overview/" role="button" aria-haspopup="true" aria-expanded="false">Overview</a></li>
+        <li class="nav-item"><a class="nav-link" href="/faq/" role="button" aria-haspopup="true" aria-expanded="false">FAQ</a></li>
+        <li class="nav-item"><a class="nav-link" href="/blog" role="button" aria-haspopup="true" aria-expanded="false">Blog</a></li>
+        <li class="nav-item dropdown">
+          <a class="nav-link dropdown-toggle" href="#"
+             id="navbarDropdownGetArrow" role="button" data-toggle="dropdown"
+             aria-haspopup="true" aria-expanded="false">
+             Get Arrow
+          </a>
+          <div class="dropdown-menu" aria-labelledby="navbarDropdownGetArrow">
+            <a class="dropdown-item" href="/install/">Install</a>
+            <a class="dropdown-item" href="/release/">Releases</a>
+            <a class="dropdown-item" href="https://github.com/apache/arrow">Source Code</a>
+          </div>
+        </li>
+        <li class="nav-item dropdown">
+          <a class="nav-link dropdown-toggle" href="#"
+             id="navbarDropdownDocumentation" role="button" data-toggle="dropdown"
+             aria-haspopup="true" aria-expanded="false">
+             Documentation
+          </a>
+          <div class="dropdown-menu" aria-labelledby="navbarDropdownDocumentation">
+            <a class="dropdown-item" href="/docs">Project Docs</a>
+            <a class="dropdown-item" href="/docs/format/Columnar.html">Format</a>
+            <hr/>
+            <a class="dropdown-item" href="/docs/c_glib">C GLib</a>
+            <a class="dropdown-item" href="/docs/cpp">C++</a>
+            <a class="dropdown-item" href="https://github.com/apache/arrow/blob/master/csharp/README.md">C#</a>
+            <a class="dropdown-item" href="https://godoc.org/github.com/apache/arrow/go/arrow">Go</a>
+            <a class="dropdown-item" href="/docs/java">Java</a>
+            <a class="dropdown-item" href="/docs/js">JavaScript</a>
+            <a class="dropdown-item" href="https://arrow.juliadata.org/stable/">Julia</a>
+            <a class="dropdown-item" href="https://github.com/apache/arrow/blob/master/matlab/README.md">MATLAB</a>
+            <a class="dropdown-item" href="/docs/python">Python</a>
+            <a class="dropdown-item" href="/docs/r">R</a>
+            <a class="dropdown-item" href="https://github.com/apache/arrow/blob/master/ruby/README.md">Ruby</a>
+            <a class="dropdown-item" href="https://docs.rs/crate/arrow/">Rust</a>
+          </div>
+        </li>
+        <li class="nav-item dropdown">
+          <a class="nav-link dropdown-toggle" href="#"
+             id="navbarDropdownSubprojects" role="button" data-toggle="dropdown"
+             aria-haspopup="true" aria-expanded="false">
+             Subprojects
+          </a>
+          <div class="dropdown-menu" aria-labelledby="navbarDropdownSubprojects">
+            <a class="dropdown-item" href="/adbc">ADBC</a>
+            <a class="dropdown-item" href="/docs/format/Flight.html">Arrow Flight</a>
+            <a class="dropdown-item" href="/docs/format/FlightSql.html">Arrow Flight SQL</a>
+            <a class="dropdown-item" href="/datafusion">DataFusion</a>
+          </div>
+        </li>
+        <li class="nav-item dropdown">
+          <a class="nav-link dropdown-toggle" href="#"
+             id="navbarDropdownCommunity" role="button" data-toggle="dropdown"
+             aria-haspopup="true" aria-expanded="false">
+             Community
+          </a>
+          <div class="dropdown-menu" aria-labelledby="navbarDropdownCommunity">
+            <a class="dropdown-item" href="/community/">Communication</a>
+            <a class="dropdown-item" href="/docs/developers/contributing.html">Contributing</a>
+            <a class="dropdown-item" href="https://github.com/apache/arrow/issues">Issue Tracker</a>
+            <a class="dropdown-item" href="/committers/">Governance</a>
+            <a class="dropdown-item" href="/use_cases/">Use Cases</a>
+            <a class="dropdown-item" href="/powered_by/">Powered By</a>
+            <a class="dropdown-item" href="/visual_identity/">Visual Identity</a>
+            <a class="dropdown-item" href="/security/">Security</a>
+            <a class="dropdown-item" href="https://www.apache.org/foundation/policies/conduct.html">Code of Conduct</a>
+          </div>
+        </li>
+        <li class="nav-item dropdown">
+          <a class="nav-link dropdown-toggle" href="#"
+             id="navbarDropdownASF" role="button" data-toggle="dropdown"
+             aria-haspopup="true" aria-expanded="false">
+             ASF Links
+          </a>
+          <div class="dropdown-menu dropdown-menu-right" aria-labelledby="navbarDropdownASF">
+            <a class="dropdown-item" href="https://www.apache.org/">ASF Website</a>
+            <a class="dropdown-item" href="https://www.apache.org/licenses/">License</a>
+            <a class="dropdown-item" href="https://www.apache.org/foundation/sponsorship.html">Donate</a>
+            <a class="dropdown-item" href="https://www.apache.org/foundation/thanks.html">Thanks</a>
+            <a class="dropdown-item" href="https://www.apache.org/security/">Security</a>
+          </div>
+        </li>
+      </ul>
+    </div><!-- /.navbar-collapse -->
+  </nav>
+
+  </header>
+
+  <div class="container p-4 pt-5">
+    <div class="col-md-8 mx-auto">
+      <main role="main" class="pb-5">
+        
+<h1>
+  Querying Parquet with Millisecond Latency
+</h1>
+<hr class="mt-4 mb-3">
+
+
+
+<p class="mb-4 pb-1">
+  <span class="badge badge-secondary">Published</span>
+  <span class="published mr-3">
+    26 Dec 2022
+  </span>
+  <br />
+  <span class="badge badge-secondary">By</span>
+  
+    tustvold and alamb
+  
+
+  
+</p>
+
+
+        <!--
+
+-->
+
+<h1 id="querying-parquet-with-millisecond-latency">Querying Parquet with Millisecond Latency</h1>
+<p><em>Note: this article was originally published on the <a href="https://www.influxdata.com/blog/querying-parquet-millisecond-latency">InfluxData Blog</a>.</em></p>
+
+<p>We believe that querying data in <a href="https://parquet.apache.org/">Apache Parquet</a> files directly can achieve similar or better storage efficiency and query performance than most specialized file formats. While it requires significant engineering effort, the benefits of Parquet’s open format and broad ecosystem support make it the obvious choice for a wide class of data systems.</p>
+
+<p>In this article we explain several advanced techniques needed to query data stored in the Parquet format quickly that we implemented in the <a href="https://docs.rs/parquet/27.0.0/parquet/">Apache Arrow Rust Parquet reader</a>. Together these techniques make the Rust implementation one of, if not the, fastest implementation for querying Parquet files — be it on local disk or remote object storage. It is able to query GBs of Parquet in a <a href="https://github.com/tustvold/access-log- [...]
+
+<p>We would like to acknowledge and thank <a href="https://www.influxdata.com/">InfluxData</a> for their support of this work. InfluxData has a deep and continuing commitment to Open source software, and it sponsored much of our time for writing this blog post as well as many contributions as part of building the <a href="https://www.influxdata.com/blog/influxdb-engine/">InfluxDB IOx Storage Engine</a>.</p>
+
+<h1 id="background">Background</h1>
+
+<p><a href="https://parquet.apache.org/">Apache Parquet</a> is an increasingly popular open format for storing <a href="https://www.influxdata.com/glossary/olap/">analytic datasets</a>, and has become the de-facto standard for cost-effective, DBMS-agnostic data storage. Initially created for the Hadoop ecosystem, Parquet’s reach now expands broadly across the data analytics ecosystem due to its compelling combination of:</p>
+
+<ul>
+  <li>High compression ratios</li>
+  <li>Amenability to commodity blob-storage such as S3</li>
+  <li>Broad ecosystem and tooling support</li>
+  <li>Portability across many different platforms and tools</li>
+  <li>Support for <a href="https://arrow.apache.org/blog/2022/10/05/arrow-parquet-encoding-part-1/">arbitrarily structured data</a></li>
+</ul>
+
+<p>Increasingly other systems, such as <a href="https://duckdb.org/2021/06/25/querying-parquet.html">DuckDB</a> and <a href="https://docs.aws.amazon.com/redshift/latest/dg/c-using-spectrum.html#c-spectrum-overview">Redshift</a> allow querying data stored in Parquet directly, but support is still often a secondary consideration compared to their native (custom) file formats. Such formats include the DuckDB <code class="language-plaintext highlighter-rouge">.duckdb</code> file format, the  [...]
+
+<p>For the first time, access to the same sophisticated query techniques, previously only available in closed source commercial implementations, are now available as open source. The required engineering capacity comes from large, well-run open source projects with global contributor communities, such as <a href="https://arrow.apache.org/">Apache Arrow</a> and <a href="https://impala.apache.org/">Apache Impala</a>.</p>
+
+<h1 id="parquet-file-format">Parquet file format</h1>
+
+<p>Before diving into the details of efficiently reading from <a href="https://www.influxdata.com/glossary/apache-parquet/">Parquet</a>, it is important to understand the file layout. The file format is carefully designed to quickly locate the desired information, skip irrelevant portions, and decode what remains efficiently.</p>
+
+<ul>
+  <li>The data in a Parquet file is broken into horizontal slices called <code class="language-plaintext highlighter-rouge">RowGroup</code>s</li>
+  <li>Each <code class="language-plaintext highlighter-rouge">RowGroup</code> contains a single <code class="language-plaintext highlighter-rouge">ColumnChunk</code> for each column in the schema</li>
+</ul>
+
+<p>For example, the following diagram illustrates a Parquet file with three columns “A”, “B” and “C” stored in two <code class="language-plaintext highlighter-rouge">RowGroup</code>s for a total of 6 <code class="language-plaintext highlighter-rouge">ColumnChunk</code>s.</p>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
+┃┏━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━┓          ┃
+┃┃┌ ─ ─ ─ ─ ─ ─ ┌ ─ ─ ─ ─ ─ ─ ┐┌ ─ ─ ─ ─ ─ ─  ┃          ┃
+┃┃             │                            │ ┃          ┃
+┃┃│             │             ││              ┃          ┃
+┃┃             │                            │ ┃          ┃
+┃┃│             │             ││              ┃ RowGroup ┃
+┃┃             │                            │ ┃     1    ┃
+┃┃│             │             ││              ┃          ┃
+┃┃             │                            │ ┃          ┃
+┃┃└ ─ ─ ─ ─ ─ ─ └ ─ ─ ─ ─ ─ ─ ┘└ ─ ─ ─ ─ ─ ─  ┃          ┃
+┃┃ColumnChunk 1  ColumnChunk 2 ColumnChunk 3  ┃          ┃
+┃┃ (Column "A")   (Column "B")  (Column "C")  ┃          ┃
+┃┗━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━┛          ┃
+┃┏━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━┓          ┃
+┃┃┌ ─ ─ ─ ─ ─ ─ ┌ ─ ─ ─ ─ ─ ─ ┐┌ ─ ─ ─ ─ ─ ─  ┃          ┃
+┃┃             │                            │ ┃          ┃
+┃┃│             │             ││              ┃          ┃
+┃┃             │                            │ ┃          ┃
+┃┃│             │             ││              ┃ RowGroup ┃
+┃┃             │                            │ ┃     2    ┃
+┃┃│             │             ││              ┃          ┃
+┃┃             │                            │ ┃          ┃
+┃┃└ ─ ─ ─ ─ ─ ─ └ ─ ─ ─ ─ ─ ─ ┘└ ─ ─ ─ ─ ─ ─  ┃          ┃
+┃┃ColumnChunk 4  ColumnChunk 5 ColumnChunk 6  ┃          ┃
+┃┃ (Column "A")   (Column "B")  (Column "C")  ┃          ┃
+┃┗━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━┛          ┃
+┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
+</code></pre></div></div>
+
+<p>The logical values for a <code class="language-plaintext highlighter-rouge">ColumnChunk</code> are written using one of the many <a href="https://parquet.apache.org/docs/file-format/data-pages/encodings/">available encodings</a> into one or more Data Pages appended sequentially in the file. At the end of a Parquet file is a footer, which contains important metadata, such as:</p>
+
+<ul>
+  <li>The file’s schema information such as column names and types</li>
+  <li>The locations of the <code class="language-plaintext highlighter-rouge">RowGroup</code> and <code class="language-plaintext highlighter-rouge">ColumnChunk</code>s in the file</li>
+</ul>
+
+<p>The footer may also contain other specialized data structures:</p>
+
+<ul>
+  <li>Optional statistics for each <code class="language-plaintext highlighter-rouge">ColumnChunk</code> including min/max values and null counts</li>
+  <li>Optional pointers to <a href="https://github.com/apache/parquet-format/blob/54e53e5d7794d383529dd30746378f19a12afd58/src/main/thrift/parquet.thrift#L926-L932">OffsetIndexes</a> containing the location of each individual Page</li>
+  <li>Optional pointers to <a href="https://github.com/apache/parquet-format/blob/54e53e5d7794d383529dd30746378f19a12afd58/src/main/thrift/parquet.thrift#L938">ColumnIndex</a> containing row counts and summary statistics for each Page</li>
+  <li>Optional pointers to <a href="https://github.com/apache/parquet-format/blob/54e53e5d7794d383529dd30746378f19a12afd58/src/main/thrift/parquet.thrift#L621-L630">BloomFilterData</a>, which can quickly check if a value is present in a <code class="language-plaintext highlighter-rouge">ColumnChunk</code></li>
+</ul>
+
+<p>For example, the logical structure of 2 Row Groups and 6 <code class="language-plaintext highlighter-rouge">ColumnChunk</code>s in the previous diagram might be stored in a Parquet file as shown in the following diagram (not to scale). The pages for the <code class="language-plaintext highlighter-rouge">ColumnChunk</code>s come first, followed by the footer. The data, the effectiveness of the encoding scheme, and the settings of the Parquet encoder determine the number of and size of  [...]
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃
+┃ Data Page for ColumnChunk 1 ("A")             ◀─┃─ ─ ─│
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃     │
+┃ Data Page for ColumnChunk 1 ("A")               ┃
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃     │
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃
+┃ Data Page for ColumnChunk 2 ("B")               ┃     │
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃     │
+┃ Data Page for ColumnChunk 3 ("C")               ┃
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃     │
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃
+┃ Data Page for ColumnChunk 3 ("C")               ┃     │
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃     │
+┃ Data Page for ColumnChunk 3 ("C")               ┃
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃     │
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃
+┃ Data Page for ColumnChunk 4 ("A")             ◀─┃─ ─ ─│─ ┐
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃     │  │
+┃ Data Page for ColumnChunk 5 ("B")               ┃
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃     │  │
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃
+┃ Data Page for ColumnChunk 5 ("B")               ┃     │  │
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃     │  │
+┃ Data Page for ColumnChunk 5 ("B")               ┃
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃     │  │
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃
+┃ Data Page for ColumnChunk 6 ("C")               ┃     │  │
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃
+┃┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃     │  │
+┃┃Footer                                        ┃ ┃
+┃┃ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ ┃     │  │
+┃┃ ┃File Metadata                             ┃ ┃ ┃
+┃┃ ┃ Schema, etc                              ┃ ┃ ┃     │  │
+┃┃ ┃ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓     ┃ ┃ ┃
+┃┃ ┃ ┃Row Group 1 Metadata              ┃     ┃ ┃ ┃     │  │
+┃┃ ┃ ┃┏━━━━━━━━━━━━━━━━━━━┓             ┃     ┃ ┃ ┃
+┃┃ ┃ ┃┃Column "A" Metadata┃ Location of ┃     ┃ ┃ ┃     │  │
+┃┃ ┃ ┃┗━━━━━━━━━━━━━━━━━━━┛ first Data  ┣ ─ ─ ╋ ╋ ╋ ─ ─
+┃┃ ┃ ┃┏━━━━━━━━━━━━━━━━━━━┓ Page, row   ┃     ┃ ┃ ┃        │
+┃┃ ┃ ┃┃Column "B" Metadata┃ counts,     ┃     ┃ ┃ ┃
+┃┃ ┃ ┃┗━━━━━━━━━━━━━━━━━━━┛ sizes,      ┃     ┃ ┃ ┃        │
+┃┃ ┃ ┃┏━━━━━━━━━━━━━━━━━━━┓ min/max     ┃     ┃ ┃ ┃
+┃┃ ┃ ┃┃Column "C" Metadata┃ values, etc ┃     ┃ ┃ ┃        │
+┃┃ ┃ ┃┗━━━━━━━━━━━━━━━━━━━┛             ┃     ┃ ┃ ┃
+┃┃ ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛     ┃ ┃ ┃        │
+┃┃ ┃ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓     ┃ ┃ ┃
+┃┃ ┃ ┃Row Group 2 Metadata              ┃     ┃ ┃ ┃        │
+┃┃ ┃ ┃┏━━━━━━━━━━━━━━━━━━━┓ Location of ┃     ┃ ┃ ┃
+┃┃ ┃ ┃┃Column "A" Metadata┃ first Data  ┃     ┃ ┃ ┃        │
+┃┃ ┃ ┃┗━━━━━━━━━━━━━━━━━━━┛ Page, row   ┣ ─ ─ ╋ ╋ ╋ ─ ─ ─ ─
+┃┃ ┃ ┃┏━━━━━━━━━━━━━━━━━━━┓ counts,     ┃     ┃ ┃ ┃
+┃┃ ┃ ┃┃Column "B" Metadata┃ sizes,      ┃     ┃ ┃ ┃
+┃┃ ┃ ┃┗━━━━━━━━━━━━━━━━━━━┛ min/max     ┃     ┃ ┃ ┃
+┃┃ ┃ ┃┏━━━━━━━━━━━━━━━━━━━┓ values, etc ┃     ┃ ┃ ┃
+┃┃ ┃ ┃┃Column "C" Metadata┃             ┃     ┃ ┃ ┃
+┃┃ ┃ ┃┗━━━━━━━━━━━━━━━━━━━┛             ┃     ┃ ┃ ┃
+┃┃ ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛     ┃ ┃ ┃
+┃┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ┃ ┃
+┃┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ┃
+┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
+</code></pre></div></div>
+
+<p>There are many important criteria to consider when creating Parquet files such as how to optimally order/cluster data and structure it into <code class="language-plaintext highlighter-rouge">RowGroup</code>s and Data Pages. Such “physical design” considerations are complex, worthy of their own series of articles, and not addressed in this blog post. Instead, we focus on how to use the available structure to make queries very fast.</p>
+
+<h1 id="optimizing-queries">Optimizing queries</h1>
+
+<p>In any query processing system, the following techniques generally improve performance:</p>
+
+<ol>
+  <li>Reduce the data that must be transferred from secondary storage for processing (reduce I/O)</li>
+  <li>Reduce the computational load for decoding the data (reduce CPU)</li>
+  <li>Interleave/pipeline the reading and decoding of the data (improve parallelism)</li>
+</ol>
+
+<p>The same principles apply to querying Parquet files, as we describe below:</p>
+
+<h1 id="decode-optimization">Decode optimization</h1>
+
+<p>Parquet achieves impressive compression ratios by using <a href="https://parquet.apache.org/docs/file-format/data-pages/encodings/">sophisticated encoding techniques</a> such as run length compression, dictionary encoding, delta encoding, and others. Consequently, the CPU-bound task of decoding can dominate query latency. Parquet readers can use a number of techniques to improve the latency and throughput of this task, as we have done in the Rust implementation.</p>
+
+<h2 id="vectorized-decode">Vectorized decode</h2>
+
+<p>Most analytic systems decode multiple values at a time to a columnar memory format, such as Apache Arrow, rather than processing data row-by-row. This is often called vectorized or columnar processing, and is beneficial because it:</p>
+
+<ul>
+  <li>Amortizes dispatch overheads to switch on the type of column being decoded</li>
+  <li>Improves cache locality by reading consecutive values from a <code class="language-plaintext highlighter-rouge">ColumnChunk</code></li>
+  <li>Often allows multiple values to be decoded in a single instruction.</li>
+  <li>Avoid many small heap allocations with a single large allocation, yielding significant savings for variable length types such as strings and byte arrays</li>
+</ul>
+
+<p>Thus, Rust Parquet Reader implements specialized decoders for reading Parquet directly into a <a href="https://www.influxdata.com/glossary/column-database/">columnar</a> memory format (Arrow Arrays).</p>
+
+<h2 id="streaming-decode">Streaming decode</h2>
+
+<p>There is no relationship between which rows are stored in which Pages across <code class="language-plaintext highlighter-rouge">ColumnChunk</code>s. For example, the logical values for the 10,000th row may be in the first page of column A and in the third page of column B.</p>
+
+<p>The simplest approach to vectorized decoding, and the one often initially implemented in Parquet decoders, is to decode an entire <code class="language-plaintext highlighter-rouge">RowGroup</code> (or <code class="language-plaintext highlighter-rouge">ColumnChunk</code>) at a time.</p>
+
+<p>However, given Parquet’s high compression ratios, a single <code class="language-plaintext highlighter-rouge">RowGroup</code> may well contain millions of rows. Decoding so many rows at once is non-optimal because it:</p>
+
+<ul>
+  <li><strong>Requires large amounts of intermediate RAM</strong>: typical in-memory formats optimized for processing, such as Apache Arrow, require much more than their Parquet-encoded form.</li>
+  <li><strong>Increases query latency</strong>: Subsequent processing steps (like filtering or aggregation) can only begin once the entire <code class="language-plaintext highlighter-rouge">RowGroup</code> (or <code class="language-plaintext highlighter-rouge">ColumnChunk</code>) is decoded.</li>
+</ul>
+
+<p>As such, the best Parquet readers support “streaming” data out in by producing configurable sized batches of rows on demand. The batch size must be large enough to amortize decode overhead, but small enough for efficient memory usage and to allow downstream processing to begin concurrently while the subsequent batch is decoded.</p>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃
+┃ Data Page for ColumnChunk 1 │◀┃─                   ┌── ─── ─── ─── ─── ┐
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃ │   ┏━━━━━━━┓        ┌ ─ ┐ ┌ ─ ┐ ┌ ─ ┐ │
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃     ┃       ┃      │                   │
+┃ Data Page for ColumnChunk 1 │ ┃ │   ┃       ┃   ─ ▶│ │   │ │   │ │   │
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃  ─ ─┃       ┃─ ┤   │  ─ ─   ─ ─   ─ ─  │
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃ │   ┃       ┃           A    B     C   │
+┃ Data Page for ColumnChunk 2 │◀┃─    ┗━━━━━━━┛  │   └── ─── ─── ─── ─── ┘
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃ │    Parquet
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃      Decoder   │            ...
+┃ Data Page for ColumnChunk 3 │ ┃ │
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃                │   ┌── ─── ─── ─── ─── ┐
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃ │                    ┌ ─ ┐ ┌ ─ ┐ ┌ ─ ┐ │
+┃ Data Page for ColumnChunk 3 │◀┃─               │   │                   │
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃                 ─ ▶│ │   │ │   │ │   │
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃                    │  ─ ─   ─ ─   ─ ─  │
+┃ Data Page for ColumnChunk 3 │ ┃                         A    B     C   │
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃                    └── ─── ─── ─── ─── ┘
+┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
+
+      Parquet file                                    Smaller in memory
+                                                         batches for
+                                                         processing
+</code></pre></div></div>
+
+<p>While streaming is not a complicated feature to explain, the stateful nature of decoding, especially across multiple columns and <a href="https://arrow.apache.org/blog/2022/10/05/arrow-parquet-encoding-part-1/">arbitrarily nested data</a>, where the relationship between rows and values is not fixed, requires <a href="https://github.com/apache/arrow-rs/blob/b7af85cb8dfe6887bb3fd43d1d76f659473b6927/parquet/src/arrow/record_reader/mod.rs">complex intermediate buffering</a> and significan [...]
+
+<h2 id="dictionary-preservation">Dictionary preservation</h2>
+
+<p>Dictionary Encoding, also called <a href="https://pandas.pydata.org/docs/user_guide/categorical.html">categorical</a> encoding, is a technique where each value in a column is not stored directly, but instead, an index in a separate list called a “Dictionary” is stored. This technique achieves many of the benefits of <a href="https://en.wikipedia.org/wiki/Third_normal_form#:~:text=Third%20normal%20form%20(3NF)%20is,in%201971%20by%20Edgar%20F.">third normal form</a> for columns that hav [...]
+
+<p>The first page in a <code class="language-plaintext highlighter-rouge">ColumnChunk</code> can optionally be a dictionary page, containing a list of values of the column’s type. Subsequent pages within this <code class="language-plaintext highlighter-rouge">ColumnChunk</code> can then encode an index into this dictionary, instead of encoding the values directly.</p>
+
+<p>Given the effectiveness of this encoding, if a Parquet decoder simply decodes dictionary data into the native type, it will inefficiently replicate the same value over and over again, which is especially disastrous for string data. To handle dictionary-encoded data efficiently, the encoding must be preserved during decode. Conveniently, many columnar formats, such as the Arrow <a href="https://docs.rs/arrow/27.0.0/arrow/array/struct.DictionaryArray.html">DictionaryArray</a>, support s [...]
+
+<p>Preserving dictionary encoding drastically improves performance when reading to an Arrow array, in some cases in excess of <a href="https://github.com/apache/arrow-rs/pull/1180">60x</a>, as well as using significantly less memory.</p>
+
+<p>The major complicating factor for preserving dictionaries is that the dictionaries are stored per <code class="language-plaintext highlighter-rouge">ColumnChunk</code>, and therefore the dictionary changes between <code class="language-plaintext highlighter-rouge">RowGroup</code>s. The reader must automatically recompute a dictionary for batches that span multiple <code class="language-plaintext highlighter-rouge">RowGroup</code>s, while also optimizing for the case that batch sizes d [...]
+
+<h1 id="projection-pushdown">Projection pushdown</h1>
+
+<p>The most basic Parquet optimization, and the one most commonly described for Parquet files, is <em>projection pushdown</em>, which reduces both I/Oand CPU requirements. Projection in this context means “selecting some but not all of the columns.” Given how Parquet organizes data, it is straightforward to read and decode only the <code class="language-plaintext highlighter-rouge">ColumnChunk</code>s required for the referenced columns.</p>
+
+<p>For example, consider a SQL query of the form</p>
+
+<pre><code class="language-SQL">SELECT B from table where A &gt; 35
+</code></pre>
+
+<p>This query only needs data for columns A and B (and not C) and the projection can be “pushed down” to the Parquet reader.</p>
+
+<p>Specifically, using the information in the footer, the Parquet reader can entirely skip fetching (I/O) and decoding (CPU) the Data Pages that store data for column C (<code class="language-plaintext highlighter-rouge">ColumnChunk</code> 3 and <code class="language-plaintext highlighter-rouge">ColumnChunk</code> 6 in our example).</p>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                             ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
+                             ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+                       ┌─────▶ Data Page for ColumnChunk 1 ("A") ┃
+                       │     ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+                       │     ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+                       ├─────▶ Data Page for ColumnChunk 1 ("A") ┃
+                       │     ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+                       │     ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+                       ├─────▶ Data Page for ColumnChunk 2 ("B") ┃
+                       │     ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+                       │     ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+                       │     ┃ Data Page for ColumnChunk 3 ("C") ┃
+                       │     ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+   A query that        │     ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+  accesses only        │     ┃ Data Page for ColumnChunk 3 ("C") ┃
+ columns A and B       │     ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+can read only the      │     ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+ relevant pages,  ─────┤     ┃ Data Page for ColumnChunk 3 ("C") ┃
+skipping any Data      │     ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+Page for column C      │     ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+                       ├─────▶ Data Page for ColumnChunk 4 ("A") ┃
+                       │     ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+                       │     ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+                       ├─────▶ Data Page for ColumnChunk 5 ("B") ┃
+                       │     ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+                       │     ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+                       ├─────▶ Data Page for ColumnChunk 5 ("B") ┃
+                       │     ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+                       │     ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+                       └─────▶ Data Page for ColumnChunk 5 ("B") ┃
+                             ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+                             ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+                             ┃ Data Page for ColumnChunk 6 ("C") ┃
+                             ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+                             ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
+</code></pre></div></div>
+
+<h1 id="predicate-pushdown">Predicate pushdown</h1>
+
+<p>Similar to projection pushdown, <strong>predicate</strong> pushdown also avoids fetching and decoding data from Parquet files, but does so using filter expressions. This technique typically requires closer integration with a query engine such as <a href="https://arrow.apache.org/datafusion/">DataFusion</a>, to determine valid predicates and evaluate them during the scan. Unfortunately without careful API design, the Parquet decoder and query engine can end up tightly coupled, preventi [...]
+
+<h2 id="rowgroup-pruning"><code class="language-plaintext highlighter-rouge">RowGroup</code> pruning</h2>
+
+<p>The simplest form of predicate pushdown, supported by many Parquet based query engines, uses the statistics stored in the footer to skip entire <code class="language-plaintext highlighter-rouge">RowGroup</code>s. We call this operation <code class="language-plaintext highlighter-rouge">RowGroup</code> <em>pruning</em>, and it is analogous to <a href="https://docs.oracle.com/database/121/VLDBG/GUID-E677C85E-C5E3-4927-B3DF-684007A7B05D.htm#VLDBG00401">partition pruning</a> in many class [...]
+
+<p>For the example query above, if the maximum value for A in a particular <code class="language-plaintext highlighter-rouge">RowGroup</code> is less than 35, the decoder can skip fetching and decoding any <code class="language-plaintext highlighter-rouge">ColumnChunk</code>s from that <strong>entire</strong> <code class="language-plaintext highlighter-rouge">RowGroup</code>.</p>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
+┃Row Group 1 Metadata                      ┃
+┃ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃
+┃ ┃Column "A" Metadata    Min:0 Max:15   ┃◀╋ ┐
+┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ┃       Using the min
+┃ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ │     and max values
+┃ ┃Column "B" Metadata                   ┃ ┃       from the
+┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ┃ │     metadata,
+┃ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃       RowGroup 1  can
+┃ ┃Column "C" Metadata                   ┃ ┃ ├ ─ ─ be entirely
+┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ┃       skipped
+┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ │     (pruned) when
+┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓       searching for
+┃Row Group 2 Metadata                      ┃ │     rows with A &gt;
+┃ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃       35,
+┃ ┃Column "A" Metadata   Min:10 Max:50   ┃◀╋ ┘
+┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ┃
+┃ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃
+┃ ┃Column "B" Metadata                   ┃ ┃
+┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ┃
+┃ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃
+┃ ┃Column "C" Metadata                   ┃ ┃
+┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ┃
+┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
+</code></pre></div></div>
+
+<p>Note that pruning on minimum and maximum values is effective for many data layouts and column types, but not all. Specifically, it is not as effective for columns with many distinct pseudo-random values (e.g. identifiers or uuids). Thankfully for this use case, Parquet also supports per <code class="language-plaintext highlighter-rouge">ColumnChunk</code> <a href="https://github.com/apache/parquet-format/blob/master/BloomFilter.md">Bloom Filters</a>. We are actively working on <a href [...]
+
+<h2 id="page-pruning">Page pruning</h2>
+
+<p>A more sophisticated form of predicate pushdown uses the optional <a href="https://github.com/apache/parquet-format/blob/master/PageIndex.md">page index</a> in the footer metadata to rule out entire Data Pages. The decoder decodes only the corresponding rows from other columns, often skipping entire pages.</p>
+
+<p>The fact that pages in different <code class="language-plaintext highlighter-rouge">ColumnChunk</code>s often contain different numbers of rows, due to various reasons, complicates this optimization. While the page index may identify the needed pages from one column, pruning a page from one column doesn’t immediately rule out entire pages in other columns.</p>
+
+<p>Page pruning proceeds as follows:</p>
+
+<ul>
+  <li>Uses the predicates in combination with the page index to identify pages to skip</li>
+  <li>Uses the offset index to determine what row ranges correspond to non-skipped pages</li>
+  <li>Computes the intersection of ranges across non-skipped pages, and decodes only those rows</li>
+</ul>
+
+<p>This last point is highly non-trivial to implement, especially for nested lists where <a href="https://arrow.apache.org/blog/2022/10/08/arrow-parquet-encoding-part-2/">a single row may correspond to multiple values</a>. Fortunately, the Rust Parquet reader hides this complexity internally, and can decode arbitrary <a href="https://docs.rs/parquet/27.0.0/parquet/arrow/arrow_reader/struct.RowSelection.html">RowSelections</a>.</p>
+
+<p>For example, to scan Columns A and B, stored in 5 Data Pages as shown in the figure below:</p>
+
+<p>If the predicate is <code class="language-plaintext highlighter-rouge">A &gt; 35</code>,</p>
+
+<ul>
+  <li>Page 1 is pruned using the page index (max value is <code class="language-plaintext highlighter-rouge">20</code>), leaving a RowSelection of  [200-&gt;onwards],</li>
+  <li>Parquet reader skips Page 3 entirely (as its last row index is <code class="language-plaintext highlighter-rouge">99</code>)</li>
+  <li>(Only) the relevant rows are read by reading pages 2, 4, and 5.</li>
+</ul>
+
+<p>If the predicate is instead <code class="language-plaintext highlighter-rouge">A &gt; 35 AND B = "F"</code> the page index is even more effective</p>
+
+<ul>
+  <li>Using <code class="language-plaintext highlighter-rouge">A &gt; 35</code>, yields a RowSelection of <code class="language-plaintext highlighter-rouge">[200-&gt;onwards]</code> as before</li>
+  <li>Using <code class="language-plaintext highlighter-rouge">B = "F"</code>, on the remaining Page 4 and Page 5 of B, yields a RowSelection of <code class="language-plaintext highlighter-rouge">[100-244]</code></li>
+  <li>Intersecting the two RowSelections leaves a combined RowSelection <code class="language-plaintext highlighter-rouge">[200-244]</code></li>
+  <li>Parquet reader only decodes those 50 rows from Page 2 and Page 4.</li>
+</ul>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┏━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━
+   ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ┃
+┃     ┌──────────────┐  │     ┌──────────────┐  │  ┃
+┃  │  │              │     │  │              │     ┃
+┃     │              │  │     │     Page     │  │
+   │  │              │     │  │      3       │     ┃
+┃     │              │  │     │   min: "A"   │  │  ┃
+┃  │  │              │     │  │   max: "C"   │     ┃
+┃     │     Page     │  │     │ first_row: 0 │  │
+   │  │      1       │     │  │              │     ┃
+┃     │   min: 10    │  │     └──────────────┘  │  ┃
+┃  │  │   max: 20    │     │  ┌──────────────┐     ┃
+┃     │ first_row: 0 │  │     │              │  │
+   │  │              │     │  │     Page     │     ┃
+┃     │              │  │     │      4       │  │  ┃
+┃  │  │              │     │  │   min: "D"   │     ┃
+┃     │              │  │     │   max: "G"   │  │
+   │  │              │     │  │first_row: 100│     ┃
+┃     └──────────────┘  │     │              │  │  ┃
+┃  │  ┌──────────────┐     │  │              │     ┃
+┃     │              │  │     └──────────────┘  │
+   │  │     Page     │     │  ┌──────────────┐     ┃
+┃     │      2       │  │     │              │  │  ┃
+┃  │  │   min: 30    │     │  │     Page     │     ┃
+┃     │   max: 40    │  │     │      5       │  │
+   │  │first_row: 200│     │  │   min: "H"   │     ┃
+┃     │              │  │     │   max: "Z"   │  │  ┃
+┃  │  │              │     │  │first_row: 250│     ┃
+┃     └──────────────┘  │     │              │  │
+   │                       │  └──────────────┘     ┃
+┃   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃
+┃       ColumnChunk            ColumnChunk         ┃
+┃            A                      B
+ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━┛
+</code></pre></div></div>
+
+<p>Support for reading and writing these indexes from Arrow C++, and by extension pyarrow/pandas, is tracked in <a href="https://issues.apache.org/jira/browse/PARQUET-1404">PARQUET-1404</a>.</p>
+
+<h2 id="late-materialization">Late materialization</h2>
+
+<p>The two previous forms of predicate pushdown only operated on metadata stored for <code class="language-plaintext highlighter-rouge">RowGroup</code>s, <code class="language-plaintext highlighter-rouge">ColumnChunk</code>s, and Data Pages prior to decoding values. However, the same techniques also extend to values of one or more columns <em>after</em> decoding them but prior to decoding other columns, which is often called “late materialization”.</p>
+
+<p>This technique is especially effective when:</p>
+
+<ul>
+  <li>The predicate is very selective, i.e. filters out large numbers of rows</li>
+  <li>Each row is large, either due to wide rows (e.g. JSON blobs) or many columns</li>
+  <li>The selected data is clustered together</li>
+  <li>The columns required by the predicate are relatively inexpensive to decode, e.g. PrimitiveArray / DictionaryArray</li>
+</ul>
+
+<p>There is additional discussion about the benefits of this technique in <a href="https://issues.apache.org/jira/browse/SPARK-36527">SPARK-36527</a> and<a href="https://docs.cloudera.com/cdw-runtime/cloud/impala-reference/topics/impala-lazy-materialization.html"> Impala</a>.</p>
+
+<p>For example, given the predicate <code class="language-plaintext highlighter-rouge">A &gt; 35 AND B = "F"</code> from above where the engine uses the page index to determine only 50 rows within RowSelection of [100-244] could match, using late materialization, the Parquet decoder:</p>
+
+<ul>
+  <li>Decodes the 50 values of Column A</li>
+  <li>Evaluates  <code class="language-plaintext highlighter-rouge">A &gt; 35 </code> on those 50 values</li>
+  <li>In this case, only 5 rows pass, resulting in the RowSelection:
+    <ul>
+      <li>RowSelection[205-206]</li>
+      <li>RowSelection[238-240]</li>
+    </ul>
+  </li>
+  <li>Only decodes the 5 rows for Column B for those selections</li>
+</ul>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  Row Index
+             ┌────────────────────┐            ┌────────────────────┐
+       200   │         30         │            │        "F"         │
+             └────────────────────┘            └────────────────────┘
+                      ...                               ...
+             ┌────────────────────┐            ┌────────────────────┐
+       205   │         37         │─ ─ ─ ─ ─ ─▶│        "F"         │
+             ├────────────────────┤            ├────────────────────┤
+       206   │         36         │─ ─ ─ ─ ─ ─▶│        "G"         │
+             └────────────────────┘            └────────────────────┘
+                      ...                               ...
+             ┌────────────────────┐            ┌────────────────────┐
+       238   │         36         │─ ─ ─ ─ ─ ─▶│        "F"         │
+             ├────────────────────┤            ├────────────────────┤
+       239   │         36         │─ ─ ─ ─ ─ ─▶│        "G"         │
+             ├────────────────────┤            ├────────────────────┤
+       240   │         40         │─ ─ ─ ─ ─ ─▶│        "G"         │
+             └────────────────────┘            └────────────────────┘
+                      ...                               ...
+             ┌────────────────────┐            ┌────────────────────┐
+      244    │         26         │            │        "D"         │
+             └────────────────────┘            └────────────────────┘
+
+
+                   Column A                          Column B
+                    Values                            Values
+</code></pre></div></div>
+
+<p>In certain cases, such as our example where B stores single character values, the cost of late materialization machinery can outweigh the savings in decoding. However, the savings can be substantial when some of the conditions listed above are fulfilled. The query engine must decide which predicates to push down and in which order to apply them for optimal results.</p>
+
+<p>While it is outside the scope of this document, the same technique can be applied for multiple predicates as well as predicates on multiple columns. See the <a href="https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.RowFilter.html">RowFilter</a> interface in the Parquet crate for more information, and the <a href="https://github.com/apache/arrow-datafusion/blob/58b43f5c0b629be49a3efa0e37052ec51d9ba3fe/datafusion/core/src/physical_plan/file_format/parquet/row_filter.rs#L [...]
+
+<h1 id="io-pushdown">I/O pushdown</h1>
+
+<p>While Parquet was designed for efficient access on the <a href="https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html">HDFS distributed file system</a>, it works very well with commodity blob storage systems such as AWS S3 as they have very similar characteristics:</p>
+
+<ul>
+  <li><strong>Relatively slow “random access” reads</strong>: it is much more efficient to read large (MBs) sections of data in each request than issue many requests for smaller portions</li>
+  <li>**Significant latency before retrieving the first byte **</li>
+  <li>**High per-request cost: **Often billed per request, regardless of number of bytes read, which incentivizes fewer requests that each read a large contiguous section of data.</li>
+</ul>
+
+<p>To read optimally from such systems, a Parquet reader must:</p>
+
+<ol>
+  <li>Minimize the number of I/O requests, while also applying the various pushdown techniques to avoid fetching large amounts of unused data.</li>
+  <li>Integrate with the appropriate task scheduling mechanism to interleave I/O and processing on the data that is fetched to avoid pipeline bottlenecks.</li>
+</ol>
+
+<p>As these are substantial engineering and integration challenges, many Parquet readers still require the files to be fetched in their entirety to local storage.</p>
+
+<p>Fetching the entire files in order to process them is not ideal for several reasons:</p>
+
+<ol>
+  <li><strong>High Latency</strong>: Decoding cannot begin until the entire file is fetched (Parquet metadata is at the end of the file, so the decoder must see the end prior to decoding the rest)</li>
+  <li><strong>Wasted work</strong>: Fetching the entire file fetches all necessary data, but also potentially lots of unnecessary data that will be skipped after reading the footer. This increases the cost unnecessarily.</li>
+  <li><strong>Requires costly “locally attached” storage (or memory)</strong>: Many cloud environments do not offer computing resources with locally attached storage – they either rely on expensive network block storage such as AWS EBS or else restrict local storage to certain classes of VMs.</li>
+</ol>
+
+<p>Avoiding the need to buffer the entire file requires a sophisticated Parquet decoder, integrated with the I/O subsystem, that can initially fetch and decode the metadata followed by ranged fetches for the relevant data blocks, interleaved with the decoding of Parquet data. This optimization requires careful engineering to fetch large enough blocks of data from the object store that the per request overhead doesn’t dominate gains from reducing the bytes transferred. <a href="https://is [...]
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                       ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
+                                                                │
+                       │
+               Step 1: Fetch                                    │
+ Parquet       Parquet metadata
+ file on ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━▼━━━━━━━┓
+ Remote  ┃      ▒▒▒▒▒▒▒▒▒▒          ▒▒▒▒▒▒▒▒▒▒               ░░░░░░░░░░ ┃
+ Object  ┃      ▒▒▒data▒▒▒          ▒▒▒data▒▒▒               ░metadata░ ┃
+  Store  ┃      ▒▒▒▒▒▒▒▒▒▒          ▒▒▒▒▒▒▒▒▒▒               ░░░░░░░░░░ ┃
+         ┗━━━━━━━━━━━▲━━━━━━━━━━━━━━━━━━━━━▲━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
+                     │                     └ ─ ─ ─
+                                                  │
+                     │                   Step 2: Fetch only
+                      ─ ─ ─ ─ ─ ─ ─ ─ ─ relevant data blocks
+</code></pre></div></div>
+
+<p>Not included in this diagram picture are details like coalescing requests and ensuring minimum request sizes needed for an actual implementation.</p>
+
+<p>The Rust Parquet crate provides an async Parquet reader, to efficiently read from any <a href="https://docs.rs/parquet/latest/parquet/arrow/async_reader/trait.AsyncFileReader.html">AsyncFileReader</a> that:</p>
+
+<ul>
+  <li>Efficiently reads from any storage medium that supports range requests</li>
+  <li>Integrates with Rust’s futures ecosystem to avoid blocking threads waiting on network I/O <a href="https://www.influxdata.com/blog/using-rustlangs-async-tokio-runtime-for-cpu-bound-tasks/">and easily can interleave CPU and network </a></li>
+  <li>Requests multiple ranges simultaneously, to allow the implementation to coalesce adjacent ranges, fetch ranges in parallel, etc.</li>
+  <li>Uses the pushdown techniques described previously to eliminate fetching data where possible</li>
+  <li>Integrates easily with the Apache Arrow <a href="https://docs.rs/object_store/latest/object_store/">object_store</a> crate which you can read more about <a href="https://www.influxdata.com/blog/rust-object-store-donation/">here</a></li>
+</ul>
+
+<p>To give a sense of what is possible, the following picture shows a timeline of fetching the footer metadata from remote files, using that metadata to determine what Data Pages to read, and then fetching data and decoding simultaneously. This process often must be done for more than one file at a time in order to match network latency, bandwidth, and available CPU.</p>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                           begin
+          metadata        read of   end read
+            read  ─ ─ ─ ┐   data    of data          │
+ begin    complete         block     block
+read of                 │   │        │               │
+metadata  ─ ─ ─ ┐                                       At any time, there are
+             │          │   │        │               │     multiple network
+             │  ▼       ▼   ▼        ▼                  requests outstanding to
+  file 1     │ ░░░░░░░░░░   ▒▒▒read▒▒▒   ▒▒▒read▒▒▒  │    hide the individual
+             │ ░░░read░░░   ▒▒▒data▒▒▒   ▒▒▒data▒▒▒        request latency
+             │ ░metadata░                         ▓▓decode▓▓
+             │ ░░░░░░░░░░                         ▓▓▓data▓▓▓
+             │                                       │
+             │
+             │ ░░░░░░░░░░  ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒read▒▒▒▒│▒▒▒▒▒▒▒▒▒▒▒▒▒▒
+   file 2    │ ░░░read░░░  ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒data▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
+             │ ░metadata░                            │              ▓▓▓▓▓decode▓▓▓▓▓▓
+             │ ░░░░░░░░░░                                           ▓▓▓▓▓▓data▓▓▓▓▓▓▓
+             │                                       │
+             │
+             │                                     ░░│░░░░░░░  ▒▒▒read▒▒▒  ▒▒▒▒read▒▒▒▒▒
+   file 3    │                                     ░░░read░░░  ▒▒▒data▒▒▒  ▒▒▒▒data▒▒▒▒▒      ...
+             │                                     ░m│tadata░            ▓▓decode▓▓
+             │                                     ░░░░░░░░░░            ▓▓▓data▓▓▓
+             └───────────────────────────────────────┼──────────────────────────────▶Time
+
+
+                                                     │
+</code></pre></div></div>
+
+<h2 id="conclusion">Conclusion</h2>
+
+<p>We hope you enjoyed reading about the Parquet file format, and the various techniques used to quickly query parquet files.</p>
+
+<p>We believe that the reason most open source implementations of Parquet do not have the breadth of features described in this post is that it takes a monumental effort, that was previously only possible at well-financed commercial enterprises which kept their implementations closed source.</p>
+
+<p>However, with the growth and quality of the Apache Arrow community, both Rust practitioners and the wider Arrow community, our ability to collaborate and build a cutting-edge open source implementation is exhilarating and immensely satisfying. The technology described in this blog is the result of the contributions of many engineers spread across companies, hobbyists, and the world in several repositories, notably <a href="https://github.com/apache/arrow-datafusion">Apache Arrow DataF [...]
+
+<p>If you are interested in joining the DataFusion Community, please <a href="https://arrow.apache.org/datafusion/contributor-guide/communication.html">get in touch</a>.</p>
+
+      </main>
+    </div>
+
+    <hr/>
+<footer class="footer">
+  <div class="row">
+    <div class="col-md-9">
+      <p>Apache Arrow, Arrow, Apache, the Apache feather logo, and the Apache Arrow project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.</p>
+      <p>&copy; 2016-2022 The Apache Software Foundation</p>
+    </div>
+    <div class="col-md-3">
+      <a class="d-sm-none d-md-inline pr-2" href="https://www.apache.org/events/current-event.html">
+        <img src="https://www.apache.org/events/current-event-234x60.png"/>
+      </a>
+    </div>
+  </div>
+</footer>
+
+  </div>
+</body>
+</html>
diff --git a/blog/index.html b/blog/index.html
index 7e8cc4b79ce..4a2df40fe13 100644
--- a/blog/index.html
+++ b/blog/index.html
@@ -236,6 +236,21 @@
 
 
   
+  <p>
+    <h3>
+      <a href="/blog/2022/12/26/querying-parquet-with-millisecond-latency/">Querying Parquet with Millisecond Latency</a>
+    </h3>
+    
+    <p>
+    <span class="blog-list-date">
+      26 December 2022
+    </span>
+    </p>
+    Querying Parquet with Millisecond Latency Note: this article was originally published on the InfluxData Blog. We believe that querying data in Apache Parquet files directly can achieve similar or better storage efficiency and query performance than most specialized file formats. While it requires significant engineering effort, the benefits of Parquet’s...
+  </p>
+  
+
+  
   <p>
     <h3>
       <a href="/blog/2022/11/22/10.0.1-release/">Apache Arrow 10.0.1 Release</a>
diff --git a/docs/c_glib/index.html b/docs/c_glib/index.html
index 1e277f24b44..6438b829a5b 100644
--- a/docs/c_glib/index.html
+++ b/docs/c_glib/index.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow GLib (C)" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow GLib (C) Apache Arrow GLib is a wrapper library for Apache Arrow C++. Apache Arrow GLib provides C API. Apache Arrow GLib supports GObject Introspection. It means that you can create language bindings at runtime or compile time automatically. API reference manuals Apache Arrow GLib Apache Parquet GLib Gandiva GLib Plasma [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow GLib (C) Apache Arrow GLib is a wrapper library for Apache Arrow C++. Apache Arrow GLib provides C API. Apache Arrow GLib supports GObject Introspection. It means that you can create language bindings at runtime or compile time automatically. API reference manuals Apache Arrow GLib Apache Parquet GLib Gandiva GLib Plasma [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/feed.xml b/feed.xml
index 26d93e49274..c6aa68146c3 100644
--- a/feed.xml
+++ b/feed.xml
@@ -1,4 +1,565 @@
-<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.2.0">Jekyll</generator><link href="https://arrow.apache.org/feed.xml" rel="self" type="application/atom+xml" /><link href="https://arrow.apache.org/" rel="alternate" type="text/html" /><updated>2022-12-20T11:05:45-05:00</updated><id>https://arrow.apache.org/feed.xml</id><title type="html">Apache Arrow</title><subtitle>Apache Arrow is a cross-language developm [...]
+<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.2.0">Jekyll</generator><link href="https://arrow.apache.org/feed.xml" rel="self" type="application/atom+xml" /><link href="https://arrow.apache.org/" rel="alternate" type="text/html" /><updated>2022-12-26T16:19:40-05:00</updated><id>https://arrow.apache.org/feed.xml</id><title type="html">Apache Arrow</title><subtitle>Apache Arrow is a cross-language developm [...]
+
+-->
+
+<h1 id="querying-parquet-with-millisecond-latency">Querying Parquet with Millisecond Latency</h1>
+<p><em>Note: this article was originally published on the <a href="https://www.influxdata.com/blog/querying-parquet-millisecond-latency">InfluxData Blog</a>.</em></p>
+
+<p>We believe that querying data in <a href="https://parquet.apache.org/">Apache Parquet</a> files directly can achieve similar or better storage efficiency and query performance than most specialized file formats. While it requires significant engineering effort, the benefits of Parquet’s open format and broad ecosystem support make it the obvious choice for a wide class of data systems.</p>
+
+<p>In this article we explain several advanced techniques needed to query data stored in the Parquet format quickly that we implemented in the <a href="https://docs.rs/parquet/27.0.0/parquet/">Apache Arrow Rust Parquet reader</a>. Together these techniques make the Rust implementation one of, if not the, fastest implementation for querying Parquet files — be it on local disk or remote object storage. It is able to query GBs of Parquet in a <a href="https://github.com/tustvold/access-log- [...]
+
+<p>We would like to acknowledge and thank <a href="https://www.influxdata.com/">InfluxData</a> for their support of this work. InfluxData has a deep and continuing commitment to Open source software, and it sponsored much of our time for writing this blog post as well as many contributions as part of building the <a href="https://www.influxdata.com/blog/influxdb-engine/">InfluxDB IOx Storage Engine</a>.</p>
+
+<h1 id="background">Background</h1>
+
+<p><a href="https://parquet.apache.org/">Apache Parquet</a> is an increasingly popular open format for storing <a href="https://www.influxdata.com/glossary/olap/">analytic datasets</a>, and has become the de-facto standard for cost-effective, DBMS-agnostic data storage. Initially created for the Hadoop ecosystem, Parquet’s reach now expands broadly across the data analytics ecosystem due to its compelling combination of:</p>
+
+<ul>
+  <li>High compression ratios</li>
+  <li>Amenability to commodity blob-storage such as S3</li>
+  <li>Broad ecosystem and tooling support</li>
+  <li>Portability across many different platforms and tools</li>
+  <li>Support for <a href="https://arrow.apache.org/blog/2022/10/05/arrow-parquet-encoding-part-1/">arbitrarily structured data</a></li>
+</ul>
+
+<p>Increasingly other systems, such as <a href="https://duckdb.org/2021/06/25/querying-parquet.html">DuckDB</a> and <a href="https://docs.aws.amazon.com/redshift/latest/dg/c-using-spectrum.html#c-spectrum-overview">Redshift</a> allow querying data stored in Parquet directly, but support is still often a secondary consideration compared to their native (custom) file formats. Such formats include the DuckDB <code class="language-plaintext highlighter-rouge">.duckdb</code> file format, the  [...]
+
+<p>For the first time, access to the same sophisticated query techniques, previously only available in closed source commercial implementations, are now available as open source. The required engineering capacity comes from large, well-run open source projects with global contributor communities, such as <a href="https://arrow.apache.org/">Apache Arrow</a> and <a href="https://impala.apache.org/">Apache Impala</a>.</p>
+
+<h1 id="parquet-file-format">Parquet file format</h1>
+
+<p>Before diving into the details of efficiently reading from <a href="https://www.influxdata.com/glossary/apache-parquet/">Parquet</a>, it is important to understand the file layout. The file format is carefully designed to quickly locate the desired information, skip irrelevant portions, and decode what remains efficiently.</p>
+
+<ul>
+  <li>The data in a Parquet file is broken into horizontal slices called <code class="language-plaintext highlighter-rouge">RowGroup</code>s</li>
+  <li>Each <code class="language-plaintext highlighter-rouge">RowGroup</code> contains a single <code class="language-plaintext highlighter-rouge">ColumnChunk</code> for each column in the schema</li>
+</ul>
+
+<p>For example, the following diagram illustrates a Parquet file with three columns “A”, “B” and “C” stored in two <code class="language-plaintext highlighter-rouge">RowGroup</code>s for a total of 6 <code class="language-plaintext highlighter-rouge">ColumnChunk</code>s.</p>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
+┃┏━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━┓          ┃
+┃┃┌ ─ ─ ─ ─ ─ ─ ┌ ─ ─ ─ ─ ─ ─ ┐┌ ─ ─ ─ ─ ─ ─  ┃          ┃
+┃┃             │                            │ ┃          ┃
+┃┃│             │             ││              ┃          ┃
+┃┃             │                            │ ┃          ┃
+┃┃│             │             ││              ┃ RowGroup ┃
+┃┃             │                            │ ┃     1    ┃
+┃┃│             │             ││              ┃          ┃
+┃┃             │                            │ ┃          ┃
+┃┃└ ─ ─ ─ ─ ─ ─ └ ─ ─ ─ ─ ─ ─ ┘└ ─ ─ ─ ─ ─ ─  ┃          ┃
+┃┃ColumnChunk 1  ColumnChunk 2 ColumnChunk 3  ┃          ┃
+┃┃ (Column "A")   (Column "B")  (Column "C")  ┃          ┃
+┃┗━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━┛          ┃
+┃┏━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━┓          ┃
+┃┃┌ ─ ─ ─ ─ ─ ─ ┌ ─ ─ ─ ─ ─ ─ ┐┌ ─ ─ ─ ─ ─ ─  ┃          ┃
+┃┃             │                            │ ┃          ┃
+┃┃│             │             ││              ┃          ┃
+┃┃             │                            │ ┃          ┃
+┃┃│             │             ││              ┃ RowGroup ┃
+┃┃             │                            │ ┃     2    ┃
+┃┃│             │             ││              ┃          ┃
+┃┃             │                            │ ┃          ┃
+┃┃└ ─ ─ ─ ─ ─ ─ └ ─ ─ ─ ─ ─ ─ ┘└ ─ ─ ─ ─ ─ ─  ┃          ┃
+┃┃ColumnChunk 4  ColumnChunk 5 ColumnChunk 6  ┃          ┃
+┃┃ (Column "A")   (Column "B")  (Column "C")  ┃          ┃
+┃┗━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━┛          ┃
+┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
+</code></pre></div></div>
+
+<p>The logical values for a <code class="language-plaintext highlighter-rouge">ColumnChunk</code> are written using one of the many <a href="https://parquet.apache.org/docs/file-format/data-pages/encodings/">available encodings</a> into one or more Data Pages appended sequentially in the file. At the end of a Parquet file is a footer, which contains important metadata, such as:</p>
+
+<ul>
+  <li>The file’s schema information such as column names and types</li>
+  <li>The locations of the <code class="language-plaintext highlighter-rouge">RowGroup</code> and <code class="language-plaintext highlighter-rouge">ColumnChunk</code>s in the file</li>
+</ul>
+
+<p>The footer may also contain other specialized data structures:</p>
+
+<ul>
+  <li>Optional statistics for each <code class="language-plaintext highlighter-rouge">ColumnChunk</code> including min/max values and null counts</li>
+  <li>Optional pointers to <a href="https://github.com/apache/parquet-format/blob/54e53e5d7794d383529dd30746378f19a12afd58/src/main/thrift/parquet.thrift#L926-L932">OffsetIndexes</a> containing the location of each individual Page</li>
+  <li>Optional pointers to <a href="https://github.com/apache/parquet-format/blob/54e53e5d7794d383529dd30746378f19a12afd58/src/main/thrift/parquet.thrift#L938">ColumnIndex</a> containing row counts and summary statistics for each Page</li>
+  <li>Optional pointers to <a href="https://github.com/apache/parquet-format/blob/54e53e5d7794d383529dd30746378f19a12afd58/src/main/thrift/parquet.thrift#L621-L630">BloomFilterData</a>, which can quickly check if a value is present in a <code class="language-plaintext highlighter-rouge">ColumnChunk</code></li>
+</ul>
+
+<p>For example, the logical structure of 2 Row Groups and 6 <code class="language-plaintext highlighter-rouge">ColumnChunk</code>s in the previous diagram might be stored in a Parquet file as shown in the following diagram (not to scale). The pages for the <code class="language-plaintext highlighter-rouge">ColumnChunk</code>s come first, followed by the footer. The data, the effectiveness of the encoding scheme, and the settings of the Parquet encoder determine the number of and size of  [...]
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃
+┃ Data Page for ColumnChunk 1 ("A")             ◀─┃─ ─ ─│
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃     │
+┃ Data Page for ColumnChunk 1 ("A")               ┃
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃     │
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃
+┃ Data Page for ColumnChunk 2 ("B")               ┃     │
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃     │
+┃ Data Page for ColumnChunk 3 ("C")               ┃
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃     │
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃
+┃ Data Page for ColumnChunk 3 ("C")               ┃     │
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃     │
+┃ Data Page for ColumnChunk 3 ("C")               ┃
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃     │
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃
+┃ Data Page for ColumnChunk 4 ("A")             ◀─┃─ ─ ─│─ ┐
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃     │  │
+┃ Data Page for ColumnChunk 5 ("B")               ┃
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃     │  │
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃
+┃ Data Page for ColumnChunk 5 ("B")               ┃     │  │
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃     │  │
+┃ Data Page for ColumnChunk 5 ("B")               ┃
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃     │  │
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐  ┃
+┃ Data Page for ColumnChunk 6 ("C")               ┃     │  │
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃
+┃┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃     │  │
+┃┃Footer                                        ┃ ┃
+┃┃ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ ┃     │  │
+┃┃ ┃File Metadata                             ┃ ┃ ┃
+┃┃ ┃ Schema, etc                              ┃ ┃ ┃     │  │
+┃┃ ┃ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓     ┃ ┃ ┃
+┃┃ ┃ ┃Row Group 1 Metadata              ┃     ┃ ┃ ┃     │  │
+┃┃ ┃ ┃┏━━━━━━━━━━━━━━━━━━━┓             ┃     ┃ ┃ ┃
+┃┃ ┃ ┃┃Column "A" Metadata┃ Location of ┃     ┃ ┃ ┃     │  │
+┃┃ ┃ ┃┗━━━━━━━━━━━━━━━━━━━┛ first Data  ┣ ─ ─ ╋ ╋ ╋ ─ ─
+┃┃ ┃ ┃┏━━━━━━━━━━━━━━━━━━━┓ Page, row   ┃     ┃ ┃ ┃        │
+┃┃ ┃ ┃┃Column "B" Metadata┃ counts,     ┃     ┃ ┃ ┃
+┃┃ ┃ ┃┗━━━━━━━━━━━━━━━━━━━┛ sizes,      ┃     ┃ ┃ ┃        │
+┃┃ ┃ ┃┏━━━━━━━━━━━━━━━━━━━┓ min/max     ┃     ┃ ┃ ┃
+┃┃ ┃ ┃┃Column "C" Metadata┃ values, etc ┃     ┃ ┃ ┃        │
+┃┃ ┃ ┃┗━━━━━━━━━━━━━━━━━━━┛             ┃     ┃ ┃ ┃
+┃┃ ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛     ┃ ┃ ┃        │
+┃┃ ┃ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓     ┃ ┃ ┃
+┃┃ ┃ ┃Row Group 2 Metadata              ┃     ┃ ┃ ┃        │
+┃┃ ┃ ┃┏━━━━━━━━━━━━━━━━━━━┓ Location of ┃     ┃ ┃ ┃
+┃┃ ┃ ┃┃Column "A" Metadata┃ first Data  ┃     ┃ ┃ ┃        │
+┃┃ ┃ ┃┗━━━━━━━━━━━━━━━━━━━┛ Page, row   ┣ ─ ─ ╋ ╋ ╋ ─ ─ ─ ─
+┃┃ ┃ ┃┏━━━━━━━━━━━━━━━━━━━┓ counts,     ┃     ┃ ┃ ┃
+┃┃ ┃ ┃┃Column "B" Metadata┃ sizes,      ┃     ┃ ┃ ┃
+┃┃ ┃ ┃┗━━━━━━━━━━━━━━━━━━━┛ min/max     ┃     ┃ ┃ ┃
+┃┃ ┃ ┃┏━━━━━━━━━━━━━━━━━━━┓ values, etc ┃     ┃ ┃ ┃
+┃┃ ┃ ┃┃Column "C" Metadata┃             ┃     ┃ ┃ ┃
+┃┃ ┃ ┃┗━━━━━━━━━━━━━━━━━━━┛             ┃     ┃ ┃ ┃
+┃┃ ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛     ┃ ┃ ┃
+┃┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ┃ ┃
+┃┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ┃
+┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
+</code></pre></div></div>
+
+<p>There are many important criteria to consider when creating Parquet files such as how to optimally order/cluster data and structure it into <code class="language-plaintext highlighter-rouge">RowGroup</code>s and Data Pages. Such “physical design” considerations are complex, worthy of their own series of articles, and not addressed in this blog post. Instead, we focus on how to use the available structure to make queries very fast.</p>
+
+<h1 id="optimizing-queries">Optimizing queries</h1>
+
+<p>In any query processing system, the following techniques generally improve performance:</p>
+
+<ol>
+  <li>Reduce the data that must be transferred from secondary storage for processing (reduce I/O)</li>
+  <li>Reduce the computational load for decoding the data (reduce CPU)</li>
+  <li>Interleave/pipeline the reading and decoding of the data (improve parallelism)</li>
+</ol>
+
+<p>The same principles apply to querying Parquet files, as we describe below:</p>
+
+<h1 id="decode-optimization">Decode optimization</h1>
+
+<p>Parquet achieves impressive compression ratios by using <a href="https://parquet.apache.org/docs/file-format/data-pages/encodings/">sophisticated encoding techniques</a> such as run length compression, dictionary encoding, delta encoding, and others. Consequently, the CPU-bound task of decoding can dominate query latency. Parquet readers can use a number of techniques to improve the latency and throughput of this task, as we have done in the Rust implementation.</p>
+
+<h2 id="vectorized-decode">Vectorized decode</h2>
+
+<p>Most analytic systems decode multiple values at a time to a columnar memory format, such as Apache Arrow, rather than processing data row-by-row. This is often called vectorized or columnar processing, and is beneficial because it:</p>
+
+<ul>
+  <li>Amortizes dispatch overheads to switch on the type of column being decoded</li>
+  <li>Improves cache locality by reading consecutive values from a <code class="language-plaintext highlighter-rouge">ColumnChunk</code></li>
+  <li>Often allows multiple values to be decoded in a single instruction.</li>
+  <li>Avoid many small heap allocations with a single large allocation, yielding significant savings for variable length types such as strings and byte arrays</li>
+</ul>
+
+<p>Thus, Rust Parquet Reader implements specialized decoders for reading Parquet directly into a <a href="https://www.influxdata.com/glossary/column-database/">columnar</a> memory format (Arrow Arrays).</p>
+
+<h2 id="streaming-decode">Streaming decode</h2>
+
+<p>There is no relationship between which rows are stored in which Pages across <code class="language-plaintext highlighter-rouge">ColumnChunk</code>s. For example, the logical values for the 10,000th row may be in the first page of column A and in the third page of column B.</p>
+
+<p>The simplest approach to vectorized decoding, and the one often initially implemented in Parquet decoders, is to decode an entire <code class="language-plaintext highlighter-rouge">RowGroup</code> (or <code class="language-plaintext highlighter-rouge">ColumnChunk</code>) at a time.</p>
+
+<p>However, given Parquet’s high compression ratios, a single <code class="language-plaintext highlighter-rouge">RowGroup</code> may well contain millions of rows. Decoding so many rows at once is non-optimal because it:</p>
+
+<ul>
+  <li><strong>Requires large amounts of intermediate RAM</strong>: typical in-memory formats optimized for processing, such as Apache Arrow, require much more than their Parquet-encoded form.</li>
+  <li><strong>Increases query latency</strong>: Subsequent processing steps (like filtering or aggregation) can only begin once the entire <code class="language-plaintext highlighter-rouge">RowGroup</code> (or <code class="language-plaintext highlighter-rouge">ColumnChunk</code>) is decoded.</li>
+</ul>
+
+<p>As such, the best Parquet readers support “streaming” data out in by producing configurable sized batches of rows on demand. The batch size must be large enough to amortize decode overhead, but small enough for efficient memory usage and to allow downstream processing to begin concurrently while the subsequent batch is decoded.</p>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃
+┃ Data Page for ColumnChunk 1 │◀┃─                   ┌── ─── ─── ─── ─── ┐
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃ │   ┏━━━━━━━┓        ┌ ─ ┐ ┌ ─ ┐ ┌ ─ ┐ │
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃     ┃       ┃      │                   │
+┃ Data Page for ColumnChunk 1 │ ┃ │   ┃       ┃   ─ ▶│ │   │ │   │ │   │
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃  ─ ─┃       ┃─ ┤   │  ─ ─   ─ ─   ─ ─  │
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃ │   ┃       ┃           A    B     C   │
+┃ Data Page for ColumnChunk 2 │◀┃─    ┗━━━━━━━┛  │   └── ─── ─── ─── ─── ┘
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃ │    Parquet
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃      Decoder   │            ...
+┃ Data Page for ColumnChunk 3 │ ┃ │
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃                │   ┌── ─── ─── ─── ─── ┐
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃ │                    ┌ ─ ┐ ┌ ─ ┐ ┌ ─ ┐ │
+┃ Data Page for ColumnChunk 3 │◀┃─               │   │                   │
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃                 ─ ▶│ │   │ │   │ │   │
+┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃                    │  ─ ─   ─ ─   ─ ─  │
+┃ Data Page for ColumnChunk 3 │ ┃                         A    B     C   │
+┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  ┃                    └── ─── ─── ─── ─── ┘
+┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
+
+      Parquet file                                    Smaller in memory
+                                                         batches for
+                                                         processing
+</code></pre></div></div>
+
+<p>While streaming is not a complicated feature to explain, the stateful nature of decoding, especially across multiple columns and <a href="https://arrow.apache.org/blog/2022/10/05/arrow-parquet-encoding-part-1/">arbitrarily nested data</a>, where the relationship between rows and values is not fixed, requires <a href="https://github.com/apache/arrow-rs/blob/b7af85cb8dfe6887bb3fd43d1d76f659473b6927/parquet/src/arrow/record_reader/mod.rs">complex intermediate buffering</a> and significan [...]
+
+<h2 id="dictionary-preservation">Dictionary preservation</h2>
+
+<p>Dictionary Encoding, also called <a href="https://pandas.pydata.org/docs/user_guide/categorical.html">categorical</a> encoding, is a technique where each value in a column is not stored directly, but instead, an index in a separate list called a “Dictionary” is stored. This technique achieves many of the benefits of <a href="https://en.wikipedia.org/wiki/Third_normal_form#:~:text=Third%20normal%20form%20(3NF)%20is,in%201971%20by%20Edgar%20F.">third normal form</a> for columns that hav [...]
+
+<p>The first page in a <code class="language-plaintext highlighter-rouge">ColumnChunk</code> can optionally be a dictionary page, containing a list of values of the column’s type. Subsequent pages within this <code class="language-plaintext highlighter-rouge">ColumnChunk</code> can then encode an index into this dictionary, instead of encoding the values directly.</p>
+
+<p>Given the effectiveness of this encoding, if a Parquet decoder simply decodes dictionary data into the native type, it will inefficiently replicate the same value over and over again, which is especially disastrous for string data. To handle dictionary-encoded data efficiently, the encoding must be preserved during decode. Conveniently, many columnar formats, such as the Arrow <a href="https://docs.rs/arrow/27.0.0/arrow/array/struct.DictionaryArray.html">DictionaryArray</a>, support s [...]
+
+<p>Preserving dictionary encoding drastically improves performance when reading to an Arrow array, in some cases in excess of <a href="https://github.com/apache/arrow-rs/pull/1180">60x</a>, as well as using significantly less memory.</p>
+
+<p>The major complicating factor for preserving dictionaries is that the dictionaries are stored per <code class="language-plaintext highlighter-rouge">ColumnChunk</code>, and therefore the dictionary changes between <code class="language-plaintext highlighter-rouge">RowGroup</code>s. The reader must automatically recompute a dictionary for batches that span multiple <code class="language-plaintext highlighter-rouge">RowGroup</code>s, while also optimizing for the case that batch sizes d [...]
+
+<h1 id="projection-pushdown">Projection pushdown</h1>
+
+<p>The most basic Parquet optimization, and the one most commonly described for Parquet files, is <em>projection pushdown</em>, which reduces both I/Oand CPU requirements. Projection in this context means “selecting some but not all of the columns.” Given how Parquet organizes data, it is straightforward to read and decode only the <code class="language-plaintext highlighter-rouge">ColumnChunk</code>s required for the referenced columns.</p>
+
+<p>For example, consider a SQL query of the form</p>
+
+<pre><code class="language-SQL">SELECT B from table where A &gt; 35
+</code></pre>
+
+<p>This query only needs data for columns A and B (and not C) and the projection can be “pushed down” to the Parquet reader.</p>
+
+<p>Specifically, using the information in the footer, the Parquet reader can entirely skip fetching (I/O) and decoding (CPU) the Data Pages that store data for column C (<code class="language-plaintext highlighter-rouge">ColumnChunk</code> 3 and <code class="language-plaintext highlighter-rouge">ColumnChunk</code> 6 in our example).</p>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                             ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
+                             ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+                       ┌─────▶ Data Page for ColumnChunk 1 ("A") ┃
+                       │     ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+                       │     ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+                       ├─────▶ Data Page for ColumnChunk 1 ("A") ┃
+                       │     ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+                       │     ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+                       ├─────▶ Data Page for ColumnChunk 2 ("B") ┃
+                       │     ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+                       │     ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+                       │     ┃ Data Page for ColumnChunk 3 ("C") ┃
+                       │     ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+   A query that        │     ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+  accesses only        │     ┃ Data Page for ColumnChunk 3 ("C") ┃
+ columns A and B       │     ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+can read only the      │     ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+ relevant pages,  ─────┤     ┃ Data Page for ColumnChunk 3 ("C") ┃
+skipping any Data      │     ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+Page for column C      │     ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+                       ├─────▶ Data Page for ColumnChunk 4 ("A") ┃
+                       │     ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+                       │     ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+                       ├─────▶ Data Page for ColumnChunk 5 ("B") ┃
+                       │     ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+                       │     ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+                       ├─────▶ Data Page for ColumnChunk 5 ("B") ┃
+                       │     ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+                       │     ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+                       └─────▶ Data Page for ColumnChunk 5 ("B") ┃
+                             ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+                             ┃┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐┃
+                             ┃ Data Page for ColumnChunk 6 ("C") ┃
+                             ┃└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘┃
+                             ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
+</code></pre></div></div>
+
+<h1 id="predicate-pushdown">Predicate pushdown</h1>
+
+<p>Similar to projection pushdown, <strong>predicate</strong> pushdown also avoids fetching and decoding data from Parquet files, but does so using filter expressions. This technique typically requires closer integration with a query engine such as <a href="https://arrow.apache.org/datafusion/">DataFusion</a>, to determine valid predicates and evaluate them during the scan. Unfortunately without careful API design, the Parquet decoder and query engine can end up tightly coupled, preventi [...]
+
+<h2 id="rowgroup-pruning"><code class="language-plaintext highlighter-rouge">RowGroup</code> pruning</h2>
+
+<p>The simplest form of predicate pushdown, supported by many Parquet based query engines, uses the statistics stored in the footer to skip entire <code class="language-plaintext highlighter-rouge">RowGroup</code>s. We call this operation <code class="language-plaintext highlighter-rouge">RowGroup</code> <em>pruning</em>, and it is analogous to <a href="https://docs.oracle.com/database/121/VLDBG/GUID-E677C85E-C5E3-4927-B3DF-684007A7B05D.htm#VLDBG00401">partition pruning</a> in many class [...]
+
+<p>For the example query above, if the maximum value for A in a particular <code class="language-plaintext highlighter-rouge">RowGroup</code> is less than 35, the decoder can skip fetching and decoding any <code class="language-plaintext highlighter-rouge">ColumnChunk</code>s from that <strong>entire</strong> <code class="language-plaintext highlighter-rouge">RowGroup</code>.</p>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
+┃Row Group 1 Metadata                      ┃
+┃ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃
+┃ ┃Column "A" Metadata    Min:0 Max:15   ┃◀╋ ┐
+┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ┃       Using the min
+┃ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ │     and max values
+┃ ┃Column "B" Metadata                   ┃ ┃       from the
+┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ┃ │     metadata,
+┃ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃       RowGroup 1  can
+┃ ┃Column "C" Metadata                   ┃ ┃ ├ ─ ─ be entirely
+┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ┃       skipped
+┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ │     (pruned) when
+┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓       searching for
+┃Row Group 2 Metadata                      ┃ │     rows with A &gt;
+┃ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃       35,
+┃ ┃Column "A" Metadata   Min:10 Max:50   ┃◀╋ ┘
+┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ┃
+┃ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃
+┃ ┃Column "B" Metadata                   ┃ ┃
+┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ┃
+┃ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃
+┃ ┃Column "C" Metadata                   ┃ ┃
+┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ ┃
+┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
+</code></pre></div></div>
+
+<p>Note that pruning on minimum and maximum values is effective for many data layouts and column types, but not all. Specifically, it is not as effective for columns with many distinct pseudo-random values (e.g. identifiers or uuids). Thankfully for this use case, Parquet also supports per <code class="language-plaintext highlighter-rouge">ColumnChunk</code> <a href="https://github.com/apache/parquet-format/blob/master/BloomFilter.md">Bloom Filters</a>. We are actively working on <a href [...]
+
+<h2 id="page-pruning">Page pruning</h2>
+
+<p>A more sophisticated form of predicate pushdown uses the optional <a href="https://github.com/apache/parquet-format/blob/master/PageIndex.md">page index</a> in the footer metadata to rule out entire Data Pages. The decoder decodes only the corresponding rows from other columns, often skipping entire pages.</p>
+
+<p>The fact that pages in different <code class="language-plaintext highlighter-rouge">ColumnChunk</code>s often contain different numbers of rows, due to various reasons, complicates this optimization. While the page index may identify the needed pages from one column, pruning a page from one column doesn’t immediately rule out entire pages in other columns.</p>
+
+<p>Page pruning proceeds as follows:</p>
+
+<ul>
+  <li>Uses the predicates in combination with the page index to identify pages to skip</li>
+  <li>Uses the offset index to determine what row ranges correspond to non-skipped pages</li>
+  <li>Computes the intersection of ranges across non-skipped pages, and decodes only those rows</li>
+</ul>
+
+<p>This last point is highly non-trivial to implement, especially for nested lists where <a href="https://arrow.apache.org/blog/2022/10/08/arrow-parquet-encoding-part-2/">a single row may correspond to multiple values</a>. Fortunately, the Rust Parquet reader hides this complexity internally, and can decode arbitrary <a href="https://docs.rs/parquet/27.0.0/parquet/arrow/arrow_reader/struct.RowSelection.html">RowSelections</a>.</p>
+
+<p>For example, to scan Columns A and B, stored in 5 Data Pages as shown in the figure below:</p>
+
+<p>If the predicate is <code class="language-plaintext highlighter-rouge">A &gt; 35</code>,</p>
+
+<ul>
+  <li>Page 1 is pruned using the page index (max value is <code class="language-plaintext highlighter-rouge">20</code>), leaving a RowSelection of  [200-&gt;onwards],</li>
+  <li>Parquet reader skips Page 3 entirely (as its last row index is <code class="language-plaintext highlighter-rouge">99</code>)</li>
+  <li>(Only) the relevant rows are read by reading pages 2, 4, and 5.</li>
+</ul>
+
+<p>If the predicate is instead <code class="language-plaintext highlighter-rouge">A &gt; 35 AND B = "F"</code> the page index is even more effective</p>
+
+<ul>
+  <li>Using <code class="language-plaintext highlighter-rouge">A &gt; 35</code>, yields a RowSelection of <code class="language-plaintext highlighter-rouge">[200-&gt;onwards]</code> as before</li>
+  <li>Using <code class="language-plaintext highlighter-rouge">B = "F"</code>, on the remaining Page 4 and Page 5 of B, yields a RowSelection of <code class="language-plaintext highlighter-rouge">[100-244]</code></li>
+  <li>Intersecting the two RowSelections leaves a combined RowSelection <code class="language-plaintext highlighter-rouge">[200-244]</code></li>
+  <li>Parquet reader only decodes those 50 rows from Page 2 and Page 4.</li>
+</ul>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┏━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━
+   ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ┃
+┃     ┌──────────────┐  │     ┌──────────────┐  │  ┃
+┃  │  │              │     │  │              │     ┃
+┃     │              │  │     │     Page     │  │
+   │  │              │     │  │      3       │     ┃
+┃     │              │  │     │   min: "A"   │  │  ┃
+┃  │  │              │     │  │   max: "C"   │     ┃
+┃     │     Page     │  │     │ first_row: 0 │  │
+   │  │      1       │     │  │              │     ┃
+┃     │   min: 10    │  │     └──────────────┘  │  ┃
+┃  │  │   max: 20    │     │  ┌──────────────┐     ┃
+┃     │ first_row: 0 │  │     │              │  │
+   │  │              │     │  │     Page     │     ┃
+┃     │              │  │     │      4       │  │  ┃
+┃  │  │              │     │  │   min: "D"   │     ┃
+┃     │              │  │     │   max: "G"   │  │
+   │  │              │     │  │first_row: 100│     ┃
+┃     └──────────────┘  │     │              │  │  ┃
+┃  │  ┌──────────────┐     │  │              │     ┃
+┃     │              │  │     └──────────────┘  │
+   │  │     Page     │     │  ┌──────────────┐     ┃
+┃     │      2       │  │     │              │  │  ┃
+┃  │  │   min: 30    │     │  │     Page     │     ┃
+┃     │   max: 40    │  │     │      5       │  │
+   │  │first_row: 200│     │  │   min: "H"   │     ┃
+┃     │              │  │     │   max: "Z"   │  │  ┃
+┃  │  │              │     │  │first_row: 250│     ┃
+┃     └──────────────┘  │     │              │  │
+   │                       │  └──────────────┘     ┃
+┃   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘  ┃
+┃       ColumnChunk            ColumnChunk         ┃
+┃            A                      B
+ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━━ ━━┛
+</code></pre></div></div>
+
+<p>Support for reading and writing these indexes from Arrow C++, and by extension pyarrow/pandas, is tracked in <a href="https://issues.apache.org/jira/browse/PARQUET-1404">PARQUET-1404</a>.</p>
+
+<h2 id="late-materialization">Late materialization</h2>
+
+<p>The two previous forms of predicate pushdown only operated on metadata stored for <code class="language-plaintext highlighter-rouge">RowGroup</code>s, <code class="language-plaintext highlighter-rouge">ColumnChunk</code>s, and Data Pages prior to decoding values. However, the same techniques also extend to values of one or more columns <em>after</em> decoding them but prior to decoding other columns, which is often called “late materialization”.</p>
+
+<p>This technique is especially effective when:</p>
+
+<ul>
+  <li>The predicate is very selective, i.e. filters out large numbers of rows</li>
+  <li>Each row is large, either due to wide rows (e.g. JSON blobs) or many columns</li>
+  <li>The selected data is clustered together</li>
+  <li>The columns required by the predicate are relatively inexpensive to decode, e.g. PrimitiveArray / DictionaryArray</li>
+</ul>
+
+<p>There is additional discussion about the benefits of this technique in <a href="https://issues.apache.org/jira/browse/SPARK-36527">SPARK-36527</a> and<a href="https://docs.cloudera.com/cdw-runtime/cloud/impala-reference/topics/impala-lazy-materialization.html"> Impala</a>.</p>
+
+<p>For example, given the predicate <code class="language-plaintext highlighter-rouge">A &gt; 35 AND B = "F"</code> from above where the engine uses the page index to determine only 50 rows within RowSelection of [100-244] could match, using late materialization, the Parquet decoder:</p>
+
+<ul>
+  <li>Decodes the 50 values of Column A</li>
+  <li>Evaluates  <code class="language-plaintext highlighter-rouge">A &gt; 35 </code> on those 50 values</li>
+  <li>In this case, only 5 rows pass, resulting in the RowSelection:
+    <ul>
+      <li>RowSelection[205-206]</li>
+      <li>RowSelection[238-240]</li>
+    </ul>
+  </li>
+  <li>Only decodes the 5 rows for Column B for those selections</li>
+</ul>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  Row Index
+             ┌────────────────────┐            ┌────────────────────┐
+       200   │         30         │            │        "F"         │
+             └────────────────────┘            └────────────────────┘
+                      ...                               ...
+             ┌────────────────────┐            ┌────────────────────┐
+       205   │         37         │─ ─ ─ ─ ─ ─▶│        "F"         │
+             ├────────────────────┤            ├────────────────────┤
+       206   │         36         │─ ─ ─ ─ ─ ─▶│        "G"         │
+             └────────────────────┘            └────────────────────┘
+                      ...                               ...
+             ┌────────────────────┐            ┌────────────────────┐
+       238   │         36         │─ ─ ─ ─ ─ ─▶│        "F"         │
+             ├────────────────────┤            ├────────────────────┤
+       239   │         36         │─ ─ ─ ─ ─ ─▶│        "G"         │
+             ├────────────────────┤            ├────────────────────┤
+       240   │         40         │─ ─ ─ ─ ─ ─▶│        "G"         │
+             └────────────────────┘            └────────────────────┘
+                      ...                               ...
+             ┌────────────────────┐            ┌────────────────────┐
+      244    │         26         │            │        "D"         │
+             └────────────────────┘            └────────────────────┘
+
+
+                   Column A                          Column B
+                    Values                            Values
+</code></pre></div></div>
+
+<p>In certain cases, such as our example where B stores single character values, the cost of late materialization machinery can outweigh the savings in decoding. However, the savings can be substantial when some of the conditions listed above are fulfilled. The query engine must decide which predicates to push down and in which order to apply them for optimal results.</p>
+
+<p>While it is outside the scope of this document, the same technique can be applied for multiple predicates as well as predicates on multiple columns. See the <a href="https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.RowFilter.html">RowFilter</a> interface in the Parquet crate for more information, and the <a href="https://github.com/apache/arrow-datafusion/blob/58b43f5c0b629be49a3efa0e37052ec51d9ba3fe/datafusion/core/src/physical_plan/file_format/parquet/row_filter.rs#L [...]
+
+<h1 id="io-pushdown">I/O pushdown</h1>
+
+<p>While Parquet was designed for efficient access on the <a href="https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html">HDFS distributed file system</a>, it works very well with commodity blob storage systems such as AWS S3 as they have very similar characteristics:</p>
+
+<ul>
+  <li><strong>Relatively slow “random access” reads</strong>: it is much more efficient to read large (MBs) sections of data in each request than issue many requests for smaller portions</li>
+  <li>**Significant latency before retrieving the first byte **</li>
+  <li>**High per-request cost: **Often billed per request, regardless of number of bytes read, which incentivizes fewer requests that each read a large contiguous section of data.</li>
+</ul>
+
+<p>To read optimally from such systems, a Parquet reader must:</p>
+
+<ol>
+  <li>Minimize the number of I/O requests, while also applying the various pushdown techniques to avoid fetching large amounts of unused data.</li>
+  <li>Integrate with the appropriate task scheduling mechanism to interleave I/O and processing on the data that is fetched to avoid pipeline bottlenecks.</li>
+</ol>
+
+<p>As these are substantial engineering and integration challenges, many Parquet readers still require the files to be fetched in their entirety to local storage.</p>
+
+<p>Fetching the entire files in order to process them is not ideal for several reasons:</p>
+
+<ol>
+  <li><strong>High Latency</strong>: Decoding cannot begin until the entire file is fetched (Parquet metadata is at the end of the file, so the decoder must see the end prior to decoding the rest)</li>
+  <li><strong>Wasted work</strong>: Fetching the entire file fetches all necessary data, but also potentially lots of unnecessary data that will be skipped after reading the footer. This increases the cost unnecessarily.</li>
+  <li><strong>Requires costly “locally attached” storage (or memory)</strong>: Many cloud environments do not offer computing resources with locally attached storage – they either rely on expensive network block storage such as AWS EBS or else restrict local storage to certain classes of VMs.</li>
+</ol>
+
+<p>Avoiding the need to buffer the entire file requires a sophisticated Parquet decoder, integrated with the I/O subsystem, that can initially fetch and decode the metadata followed by ranged fetches for the relevant data blocks, interleaved with the decoding of Parquet data. This optimization requires careful engineering to fetch large enough blocks of data from the object store that the per request overhead doesn’t dominate gains from reducing the bytes transferred. <a href="https://is [...]
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                       ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
+                                                                │
+                       │
+               Step 1: Fetch                                    │
+ Parquet       Parquet metadata
+ file on ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━▼━━━━━━━┓
+ Remote  ┃      ▒▒▒▒▒▒▒▒▒▒          ▒▒▒▒▒▒▒▒▒▒               ░░░░░░░░░░ ┃
+ Object  ┃      ▒▒▒data▒▒▒          ▒▒▒data▒▒▒               ░metadata░ ┃
+  Store  ┃      ▒▒▒▒▒▒▒▒▒▒          ▒▒▒▒▒▒▒▒▒▒               ░░░░░░░░░░ ┃
+         ┗━━━━━━━━━━━▲━━━━━━━━━━━━━━━━━━━━━▲━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
+                     │                     └ ─ ─ ─
+                                                  │
+                     │                   Step 2: Fetch only
+                      ─ ─ ─ ─ ─ ─ ─ ─ ─ relevant data blocks
+</code></pre></div></div>
+
+<p>Not included in this diagram picture are details like coalescing requests and ensuring minimum request sizes needed for an actual implementation.</p>
+
+<p>The Rust Parquet crate provides an async Parquet reader, to efficiently read from any <a href="https://docs.rs/parquet/latest/parquet/arrow/async_reader/trait.AsyncFileReader.html">AsyncFileReader</a> that:</p>
+
+<ul>
+  <li>Efficiently reads from any storage medium that supports range requests</li>
+  <li>Integrates with Rust’s futures ecosystem to avoid blocking threads waiting on network I/O <a href="https://www.influxdata.com/blog/using-rustlangs-async-tokio-runtime-for-cpu-bound-tasks/">and easily can interleave CPU and network </a></li>
+  <li>Requests multiple ranges simultaneously, to allow the implementation to coalesce adjacent ranges, fetch ranges in parallel, etc.</li>
+  <li>Uses the pushdown techniques described previously to eliminate fetching data where possible</li>
+  <li>Integrates easily with the Apache Arrow <a href="https://docs.rs/object_store/latest/object_store/">object_store</a> crate which you can read more about <a href="https://www.influxdata.com/blog/rust-object-store-donation/">here</a></li>
+</ul>
+
+<p>To give a sense of what is possible, the following picture shows a timeline of fetching the footer metadata from remote files, using that metadata to determine what Data Pages to read, and then fetching data and decoding simultaneously. This process often must be done for more than one file at a time in order to match network latency, bandwidth, and available CPU.</p>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                           begin
+          metadata        read of   end read
+            read  ─ ─ ─ ┐   data    of data          │
+ begin    complete         block     block
+read of                 │   │        │               │
+metadata  ─ ─ ─ ┐                                       At any time, there are
+             │          │   │        │               │     multiple network
+             │  ▼       ▼   ▼        ▼                  requests outstanding to
+  file 1     │ ░░░░░░░░░░   ▒▒▒read▒▒▒   ▒▒▒read▒▒▒  │    hide the individual
+             │ ░░░read░░░   ▒▒▒data▒▒▒   ▒▒▒data▒▒▒        request latency
+             │ ░metadata░                         ▓▓decode▓▓
+             │ ░░░░░░░░░░                         ▓▓▓data▓▓▓
+             │                                       │
+             │
+             │ ░░░░░░░░░░  ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒read▒▒▒▒│▒▒▒▒▒▒▒▒▒▒▒▒▒▒
+   file 2    │ ░░░read░░░  ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒data▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒
+             │ ░metadata░                            │              ▓▓▓▓▓decode▓▓▓▓▓▓
+             │ ░░░░░░░░░░                                           ▓▓▓▓▓▓data▓▓▓▓▓▓▓
+             │                                       │
+             │
+             │                                     ░░│░░░░░░░  ▒▒▒read▒▒▒  ▒▒▒▒read▒▒▒▒▒
+   file 3    │                                     ░░░read░░░  ▒▒▒data▒▒▒  ▒▒▒▒data▒▒▒▒▒      ...
+             │                                     ░m│tadata░            ▓▓decode▓▓
+             │                                     ░░░░░░░░░░            ▓▓▓data▓▓▓
+             └───────────────────────────────────────┼──────────────────────────────▶Time
+
+
+                                                     │
+</code></pre></div></div>
+
+<h2 id="conclusion">Conclusion</h2>
+
+<p>We hope you enjoyed reading about the Parquet file format, and the various techniques used to quickly query parquet files.</p>
+
+<p>We believe that the reason most open source implementations of Parquet do not have the breadth of features described in this post is that it takes a monumental effort, that was previously only possible at well-financed commercial enterprises which kept their implementations closed source.</p>
+
+<p>However, with the growth and quality of the Apache Arrow community, both Rust practitioners and the wider Arrow community, our ability to collaborate and build a cutting-edge open source implementation is exhilarating and immensely satisfying. The technology described in this blog is the result of the contributions of many engineers spread across companies, hobbyists, and the world in several repositories, notably <a href="https://github.com/apache/arrow-datafusion">Apache Arrow DataF [...]
+
+<p>If you are interested in joining the DataFusion Community, please <a href="https://arrow.apache.org/datafusion/contributor-guide/communication.html">get in touch</a>.</p>]]></content><author><name>tustvold and alamb</name></author><category term="parquet" /><summary type="html"><![CDATA[Querying Parquet with Millisecond Latency Note: this article was originally published on the InfluxData Blog. We believe that querying data in Apache Parquet files directly can achieve similar or bette [...]
 
 -->
 
@@ -1645,122 +2206,4 @@ ways to engage with the community.</p>
 
 <p>In our <a href="https://arrow.apache.org/blog/2022/10/17/arrow-parquet-encoding-part-3/">final blog post</a>, we explain how Parquet and Arrow combine these concepts to support arbitrary nesting of potentially nullable data structures.</p>
 
-<p>If you want to store and process structured types, you will be pleased to hear that the Rust <a href="https://crates.io/crates/parquet">parquet</a> implementation fully supports reading and writing directly into Arrow, as simply as any other type. All the complex record shredding and reconstruction is handled automatically. With this and other exciting features such as  <a href="https://docs.rs/parquet/22.0.0/parquet/arrow/async_reader/index.html">reading asynchronously</a> from <a hr [...]
-
--->
-
-<h2 id="introduction">Introduction</h2>
-
-<p>We recently completed a long-running project within <a href="https://github.com/apache/arrow-rs">Rust Apache Arrow</a> to complete support for reading and writing arbitrarily nested Parquet and Arrow schemas. This is a complex topic, and we encountered a lack of approachable technical information, and thus wrote this blog to share our learnings with the community.</p>
-
-<p><a href="https://arrow.apache.org/">Apache Arrow</a> is an open, language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations. <a href="https://parquet.apache.org/">Apache Parquet</a> is an open, column-oriented data file format designed for very efficient data encoding and retrieval.</p>
-
-<p>It is increasingly common for analytic systems to use Arrow to process data stored in Parquet files, and therefore fast, efficient, and correct translation between them is a key building block.</p>
-
-<p>Historically analytic processing primarily focused on querying data with a tabular schema, where there are a fixed number of columns, and each row contains a single value for each column. However, with the increasing adoption of structured document formats such as XML, JSON, etc…, only supporting tabular schema can be frustrating for users, as it necessitates often non-trivial data transformation to first flatten the document data.</p>
-
-<p>As of version <a href="https://crates.io/crates/arrow/20.0.0">20.0.0</a>, released in August 2022, the Rust Arrow implementation for reading structured types is feature complete. Instructions for getting started can be found <a href="https://docs.rs/parquet/latest/parquet/arrow/index.html">here</a> and feel free to raise any issues on our <a href="https://github.com/apache/arrow-rs/issues">bugtracker</a>.</p>
-
-<p>In this series we will explain how Parquet and Arrow represent nested data, highlighting the similarities and differences between them, and give a flavor of the practicalities of converting between the formats.</p>
-
-<h2 id="columnar-vs-record-oriented">Columnar vs Record-Oriented</h2>
-
-<p>First, it is necessary to take a step back and discuss the difference between columnar and record-oriented data formats. In a record oriented data format, such as newline-delimited JSON (NDJSON), all the values for a given record are stored contiguously.</p>
-
-<p>For example</p>
-
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="s">"Column1"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s">"Column2"</span><span class="p">:</span> <span class="mi">2</span><span class="p">}</span>
-<span class="p">{</span><span class="s">"Column1"</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span> <span class="s">"Column2"</span><span class="p">:</span> <span class="mi">4</span><span class="p">,</span> <span class="s">"Column3"</span><span class="p">:</span> <span class="mi">5</span><span class="p">}</span>
-<span class="p">{</span><span class="s">"Column1"</span><span class="p">:</span> <span class="mi">5</span><span class="p">,</span> <span class="s">"Column2"</span><span class="p">:</span> <span class="mi">4</span><span class="p">,</span> <span class="s">"Column3"</span><span class="p">:</span> <span class="mi">5</span><span class="p">}</span>
-</code></pre></div></div>
-
-<p>In a columnar representation, the data for a given column is instead stored contiguously</p>
-
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Column1</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">5</span><span class="p">]</span>
-<span class="n">Column2</span><span class="p">:</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">4</span><span class="p">]</span>
-<span class="n">Column3</span><span class="p">:</span> <span class="p">[</span><span class="n">null</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">5</span><span class="p">]</span>
-</code></pre></div></div>
-
-<p>Aside from potentially yielding better data compression, a columnar layout can dramatically improve performance of certain queries. This is because laying data out contiguously in memory allows both the compiler and CPU to better exploit opportunities for parallelism. The specifics of <a href="https://en.wikipedia.org/wiki/Single_instruction,_multiple_data">SIMD</a> and <a href="https://en.wikipedia.org/wiki/Instruction-level_parallelism">ILP</a> are well beyond the scope of this post [...]
-
-<h2 id="parquet-vs-arrow">Parquet vs Arrow</h2>
-<p>Parquet and Arrow are complementary technologies, and they make some different design tradeoffs. In particular, Parquet is a storage format designed for maximum space efficiency, whereas Arrow is an in-memory format intended for operation by vectorized computational kernels.</p>
-
-<p>The major distinction is that Arrow provides <code class="language-plaintext highlighter-rouge">O(1)</code> random access lookups to any array index, whilst Parquet does not. In particular, Parquet uses <a href="https://akshays-blog.medium.com/wrapping-head-around-repetition-and-definition-levels-in-dremel-powering-bigquery-c1a33c9695da">dremel record shredding</a>, <a href="https://github.com/apache/parquet-format/blob/master/Encodings.md">variable length encoding schemes</a>, and <a [...]
-
-<p>A common pattern that plays to each technologies strengths, is to stream data from a compressed representation, such as Parquet, in thousand row batches in the Arrow format, process these batches individually, and accumulate the results in a more compressed representation. This benefits from the ability to efficiently perform computations on Arrow data, whilst keeping memory requirements in check, and allowing the computation kernels to be agnostic to the encodings of the source and d [...]
-
-<p><strong>Arrow is primarily an in-memory format, whereas Parquet is a storage format.</strong></p>
-
-<h2 id="non-nullable-primitive-column">Non-Nullable Primitive Column</h2>
-
-<p>Let us start with the simplest case of a non-nullable list of 32-bit signed integers.</p>
-
-<p>In Arrow this would be represented as a <code class="language-plaintext highlighter-rouge">PrimitiveArray</code>, which would store them contiguously in memory</p>
-
-<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌─────┐
-│  1  │
-├─────┤
-│  2  │
-├─────┤
-│  3  │
-├─────┤
-│  4  │
-└─────┘
-Values
-</code></pre></div></div>
-
-<p>Parquet has multiple <a href="https://parquet.apache.org/docs/file-format/data-pages/encodings/">different encodings</a> that may be used for integer types, the exact details of which are beyond the scope of this post. Broadly speaking the data will be stored in one or more <a href="https://parquet.apache.org/docs/file-format/data-pages/"><em>DataPage</em></a>s containing the integers in an encoded form</p>
-
-<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌─────┐
-│  1  │
-├─────┤
-|  2  │
-├─────┤
-│  3  │
-├─────┤
-│  4  │
-└─────┘
-Values
-</code></pre></div></div>
-
-<h1 id="nullable-primitive-column">Nullable Primitive Column</h1>
-
-<p>Now let us consider the case of a nullable column, where some of the values might have the special sentinel value <code class="language-plaintext highlighter-rouge">NULL</code> that designates “this value is unknown”.</p>
-
-<p>In Arrow, nulls are stored separately from the values in the form of a <a href="https://arrow.apache.org/docs/format/Columnar.html#validity-bitmaps">validity bitmask</a>, with arbitrary data in the corresponding positions in the values buffer. This space efficient encoding means that the entire validity mask for the following example is stored using 5 bits</p>
-
-<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌─────┐   ┌─────┐
-│  1  │   │  1  │
-├─────┤   ├─────┤
-│  0  │   │ ??  │
-├─────┤   ├─────┤
-│  1  │   │  3  │
-├─────┤   ├─────┤
-│  1  │   │  4  │
-├─────┤   ├─────┤
-│  0  │   │ ??  │
-└─────┘   └─────┘
-Validity   Values
-</code></pre></div></div>
-
-<p>In Parquet the validity information is also stored separately from the values, however, instead of being encoded as a validity bitmask it is encoded as a list of 16-bit integers called <em>definition levels</em>. Like other data in Parquet, these integer definition levels are stored using high efficiency encoding, and will be expanded upon in the next post, but for now a definition level of <code class="language-plaintext highlighter-rouge">1</code> indicates a valid value, and <code  [...]
-
-<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌─────┐    ┌─────┐
-│  1  │    │  1  │
-├─────┤    ├─────┤
-│  0  │    │  3  │
-├─────┤    ├─────┤
-│  1  │    │  4  │
-├─────┤    └─────┘
-│  1  │
-├─────┤
-│  0  │
-└─────┘
-Definition  Values
- Levels
-</code></pre></div></div>
-
-<h2 id="next-up-nested-and-hierarchical-data">Next up: Nested and Hierarchical Data</h2>
-
-<p>Armed with the foundational understanding of how Arrow and Parquet store nullability / definition differently we are ready to move on to more complex nested types, which you can read about in our <a href="https://arrow.apache.org/blog/2022/10/08/arrow-parquet-encoding-part-2/">next blog post on the topic</a>.</p>]]></content><author><name>tustvold and alamb</name></author><category term="parquet" /><category term="arrow" /><summary type="html"><![CDATA[Introduction We recently complet [...]
\ No newline at end of file
+<p>If you want to store and process structured types, you will be pleased to hear that the Rust <a href="https://crates.io/crates/parquet">parquet</a> implementation fully supports reading and writing directly into Arrow, as simply as any other type. All the complex record shredding and reconstruction is handled automatically. With this and other exciting features such as  <a href="https://docs.rs/parquet/22.0.0/parquet/arrow/async_reader/index.html">reading asynchronously</a> from <a hr [...]
\ No newline at end of file
diff --git a/release/0.1.0.html b/release/0.1.0.html
index 8629ce8cb5c..2155f78379a 100644
--- a/release/0.1.0.html
+++ b/release/0.1.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="0.1.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.1.0 (10 October 2016) Download Source Release: [apache-arrow-0.1.0.tar.gz][6] Verification: [md5][3], [asc][7] Changelog Contributors $ git shortlog -sn d5aa7c46..apache-arrow-0.1.0 49 Wes McKinney 27 Uwe L. Korn 25 Julien Le Dem 13 Micah Kornfield 11 Steven Phillips 6 Jihoon Son 5 Laurent Goujon 5 adeneche 4 Dan Robin [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.1.0 (10 October 2016) Download Source Release: [apache-arrow-0.1.0.tar.gz][6] Verification: [md5][3], [asc][7] Changelog Contributors $ git shortlog -sn d5aa7c46..apache-arrow-0.1.0 49 Wes McKinney 27 Uwe L. Korn 25 Julien Le Dem 13 Micah Kornfield 11 Steven Phillips 6 Jihoon Son 5 Laurent Goujon 5 adeneche 4 Dan Robin [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.10.0.html b/release/0.10.0.html
index 32c78809c8f..45f5e57dcaa 100644
--- a/release/0.10.0.html
+++ b/release/0.10.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 0.10.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.10.0 (6 August 2018) This is a major release. Download Source Artifacts Binary Artifacts Git tag Contributors $ git shortlog -sn apache-arrow-0.9.0..apache-arrow-0.10.0 70 Antoine Pitrou 49 Kouhei Sutou 40 Korn, Uwe 37 Wes McKinney 32 Krisztián Szűcs 30 Andy Grove 20 Philipp Moritz 13 Phillip Cloud 11 Bryan Cutler 11 y [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.10.0 (6 August 2018) This is a major release. Download Source Artifacts Binary Artifacts Git tag Contributors $ git shortlog -sn apache-arrow-0.9.0..apache-arrow-0.10.0 70 Antoine Pitrou 49 Kouhei Sutou 40 Korn, Uwe 37 Wes McKinney 32 Krisztián Szűcs 30 Andy Grove 20 Philipp Moritz 13 Phillip Cloud 11 Bryan Cutler 11 y [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.11.0.html b/release/0.11.0.html
index 3b529eed21d..1ff02157797 100644
--- a/release/0.11.0.html
+++ b/release/0.11.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 0.11.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.11.0 (8 October 2018) This is a major release. Download Source Artifacts Binary Artifacts Git tag Contributors This includes patches from Apache Parquet that were merged. $ git shortlog -sn apache-arrow-0.10.0..apache-arrow-0.11.0 166 Wes McKinney 59 Uwe L. Korn 57 Deepak Majeti 54 Kouhei Sutou 50 Krisztián Szűcs 48 An [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.11.0 (8 October 2018) This is a major release. Download Source Artifacts Binary Artifacts Git tag Contributors This includes patches from Apache Parquet that were merged. $ git shortlog -sn apache-arrow-0.10.0..apache-arrow-0.11.0 166 Wes McKinney 59 Uwe L. Korn 57 Deepak Majeti 54 Kouhei Sutou 50 Krisztián Szűcs 48 An [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.11.1.html b/release/0.11.1.html
index f87ccb12a5b..090a2056ce4 100644
--- a/release/0.11.1.html
+++ b/release/0.11.1.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 0.11.1 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.11.1 (19 October 2018) This is a bugfix release to address a Python packaging issue with zlib that resulted in bug ARROW-3514. Download Source Artifacts Binary Artifacts Git tag Changelog New Features and Improvements ARROW-3353 - [Packaging] Build python 3.7 wheels ARROW-3534 - [Python] Update zlib library in manylinu [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.11.1 (19 October 2018) This is a bugfix release to address a Python packaging issue with zlib that resulted in bug ARROW-3514. Download Source Artifacts Binary Artifacts Git tag Changelog New Features and Improvements ARROW-3353 - [Packaging] Build python 3.7 wheels ARROW-3534 - [Python] Update zlib library in manylinu [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.12.0.html b/release/0.12.0.html
index 675557a01eb..bffe39a892d 100644
--- a/release/0.12.0.html
+++ b/release/0.12.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 0.12.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.12.0 (20 January 2019) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts Git tag 8ca41384b5324bfd0ef3d3ed3f728e1d10ed73f0 Contributors This release includes 601 commits from 77 distinct contributors. $ git shortlog -sn apache-arrow-0.11.0..apache-arrow-0.12.0 [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.12.0 (20 January 2019) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts Git tag 8ca41384b5324bfd0ef3d3ed3f728e1d10ed73f0 Contributors This release includes 601 commits from 77 distinct contributors. $ git shortlog -sn apache-arrow-0.11.0..apache-arrow-0.12.0 [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.13.0.html b/release/0.13.0.html
index 52529b2db6c..589ac2bd4d2 100644
--- a/release/0.13.0.html
+++ b/release/0.13.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 0.13.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.13.0 (1 April 2019) This is a major release covering more than 2 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 550 commits from 81 distinct contributors. $ git shortlog -sn apache-arrow-0.12.0..apache-arrow-0.13.0 [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.13.0 (1 April 2019) This is a major release covering more than 2 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 550 commits from 81 distinct contributors. $ git shortlog -sn apache-arrow-0.12.0..apache-arrow-0.13.0 [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.14.0.html b/release/0.14.0.html
index 8c165b5b5a7..b2ca6d960e6 100644
--- a/release/0.14.0.html
+++ b/release/0.14.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 0.14.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.14.0 (4 July 2019) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 570 commits from 78 distinct contributors. $ git shortlog -sn apache-arrow-0.13.0..apache-arrow-0.14.0  [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.14.0 (4 July 2019) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 570 commits from 78 distinct contributors. $ git shortlog -sn apache-arrow-0.13.0..apache-arrow-0.14.0  [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.14.1.html b/release/0.14.1.html
index 6449c8f93fa..02384901bf6 100644
--- a/release/0.14.1.html
+++ b/release/0.14.1.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 0.14.1 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.14.1 (22 July 2019) This is a bugfix release to address a Python wheel packaging issues and Parquet forward compatibility problems. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 52 commits from 16 distinct contributors. $ git shortlog - [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.14.1 (22 July 2019) This is a bugfix release to address a Python wheel packaging issues and Parquet forward compatibility problems. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 52 commits from 16 distinct contributors. $ git shortlog - [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.15.0.html b/release/0.15.0.html
index e6e955b04be..ae67ede6cd4 100644
--- a/release/0.15.0.html
+++ b/release/0.15.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 0.15.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.15.0 (5 October 2019) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 672 commits from 80 distinct contributors. $ git shortlog -sn apache-arrow-0.14.0..apache-arrow-0.15 [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.15.0 (5 October 2019) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 672 commits from 80 distinct contributors. $ git shortlog -sn apache-arrow-0.14.0..apache-arrow-0.15 [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.15.1.html b/release/0.15.1.html
index c4c914d3c98..52bd08f7094 100644
--- a/release/0.15.1.html
+++ b/release/0.15.1.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 0.15.1 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.15.1 (1 November 2019) This is a major release covering more than 1 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 41 commits from 13 distinct contributors. $ git shortlog -sn apache-arrow-0.15.0..apache-arrow-0.15 [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.15.1 (1 November 2019) This is a major release covering more than 1 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 41 commits from 13 distinct contributors. $ git shortlog -sn apache-arrow-0.15.0..apache-arrow-0.15 [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.16.0.html b/release/0.16.0.html
index 5639fa01a1d..465b74128c5 100644
--- a/release/0.16.0.html
+++ b/release/0.16.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 0.16.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.16.0 (7 February 2020) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 709 commits from 99 distinct contributors. $ git shortlog -sn apache-arrow-0.15.1..apache-arrow-0.1 [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.16.0 (7 February 2020) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 709 commits from 99 distinct contributors. $ git shortlog -sn apache-arrow-0.15.1..apache-arrow-0.1 [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.17.0.html b/release/0.17.0.html
index 9216144f91b..296c99fe7a3 100644
--- a/release/0.17.0.html
+++ b/release/0.17.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 0.17.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.17.0 (20 April 2020) This is a major release covering more than 2 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 569 commits from 79 distinct contributors. $ git shortlog -sn apache-arrow-0.16.0..apache-arrow-0.17. [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.17.0 (20 April 2020) This is a major release covering more than 2 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 569 commits from 79 distinct contributors. $ git shortlog -sn apache-arrow-0.16.0..apache-arrow-0.17. [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.17.1.html b/release/0.17.1.html
index c806de01839..315631914cf 100644
--- a/release/0.17.1.html
+++ b/release/0.17.1.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 0.17.1 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.17.1 (18 May 2020) This is a patch release fixing bugs and regressions listed in the changelog below. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 27 commits from 11 distinct contributors. $ git shortlog -sn apache-arrow-0.17.0..apache [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.17.1 (18 May 2020) This is a patch release fixing bugs and regressions listed in the changelog below. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 27 commits from 11 distinct contributors. $ git shortlog -sn apache-arrow-0.17.0..apache [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.2.0.html b/release/0.2.0.html
index cc0f3f9aff0..b7218499504 100644
--- a/release/0.2.0.html
+++ b/release/0.2.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="0.2.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.2.0 (18 February 2017) Download Source Artifacts Git tag Changelog Contributors $ git shortlog -sn apache-arrow-0.1.0..apache-arrow-0.2.0 73 Wes McKinney 55 Uwe L. Korn 16 Julien Le Dem 4 Bryan Cutler 4 Nong Li 2 Christopher C. Aycock 2 Jingyuan Wang 2 Kouhei Sutou 2 Laurent Goujon 2 Leif Walsh 1 Emilio Lahr-Vivaz 1 Ho [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.2.0 (18 February 2017) Download Source Artifacts Git tag Changelog Contributors $ git shortlog -sn apache-arrow-0.1.0..apache-arrow-0.2.0 73 Wes McKinney 55 Uwe L. Korn 16 Julien Le Dem 4 Bryan Cutler 4 Nong Li 2 Christopher C. Aycock 2 Jingyuan Wang 2 Kouhei Sutou 2 Laurent Goujon 2 Leif Walsh 1 Emilio Lahr-Vivaz 1 Ho [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.3.0.html b/release/0.3.0.html
index cc56caf77da..02240994ec8 100644
--- a/release/0.3.0.html
+++ b/release/0.3.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="0.3.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.3.0 (5 May 2017) Read more in the release blog post Download Source Artifacts Git tag d8db8f8 Changelog Contributors $ git shortlog -sn apache-arrow-0.2.0..apache-arrow-0.3.0 119 Wes McKinney 55 Kouhei Sutou 18 Uwe L. Korn 17 Julien Le Dem 9 Phillip Cloud 6 Bryan Cutler 5 Emilio Lahr-Vivaz 5 Philipp Moritz 4 Jeff Knupp [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.3.0 (5 May 2017) Read more in the release blog post Download Source Artifacts Git tag d8db8f8 Changelog Contributors $ git shortlog -sn apache-arrow-0.2.0..apache-arrow-0.3.0 119 Wes McKinney 55 Kouhei Sutou 18 Uwe L. Korn 17 Julien Le Dem 9 Phillip Cloud 6 Bryan Cutler 5 Emilio Lahr-Vivaz 5 Philipp Moritz 4 Jeff Knupp [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.4.0.html b/release/0.4.0.html
index 77822fb9e61..1614eaa35be 100644
--- a/release/0.4.0.html
+++ b/release/0.4.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="0.4.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.4.0 (22 May 2017) Read more in the release blog post Download Source Artifacts Git tag a8f8ba0 Changelog Contributors $ git shortlog -sn apache-arrow-0.3.0..apache-arrow-0.4.0 28 Wes McKinney 18 Kouhei Sutou 9 Uwe L. Korn 3 Brian Hulette 3 Emilio Lahr-Vivaz 3 Philipp Moritz 3 Phillip Cloud 2 Julien Le Dem 1 Bryan Cutle [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.4.0 (22 May 2017) Read more in the release blog post Download Source Artifacts Git tag a8f8ba0 Changelog Contributors $ git shortlog -sn apache-arrow-0.3.0..apache-arrow-0.4.0 28 Wes McKinney 18 Kouhei Sutou 9 Uwe L. Korn 3 Brian Hulette 3 Emilio Lahr-Vivaz 3 Philipp Moritz 3 Phillip Cloud 2 Julien Le Dem 1 Bryan Cutle [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.4.1.html b/release/0.4.1.html
index bd94222e97d..6f242a3ec05 100644
--- a/release/0.4.1.html
+++ b/release/0.4.1.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="0.4.1 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.4.1 (9 June 2017) This is primarily a bug fix release, but also includes some packaging and documentation improvements. Read more in the release blog post. Download Source Artifacts Git tag 46315431 Changelog New Features and Improvements ARROW-1020 - [Format] Add additional language to Schema.fbs to clarify naive vs.  [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.4.1 (9 June 2017) This is primarily a bug fix release, but also includes some packaging and documentation improvements. Read more in the release blog post. Download Source Artifacts Git tag 46315431 Changelog New Features and Improvements ARROW-1020 - [Format] Add additional language to Schema.fbs to clarify naive vs.  [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.5.0.html b/release/0.5.0.html
index cb456815b90..c1a0e35022b 100644
--- a/release/0.5.0.html
+++ b/release/0.5.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 0.5.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.5.0 (23 July 2017) This is a major release, with expanded features in the supported languages and additional integration test coverage between Java and C++. Read more in the release blog post. Download Source Artifacts Git tag Contributors $ git shortlog -sn apache-arrow-0.4.1..apache-arrow-0.5.0 42 Wes McKinney 22 Uwe [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.5.0 (23 July 2017) This is a major release, with expanded features in the supported languages and additional integration test coverage between Java and C++. Read more in the release blog post. Download Source Artifacts Git tag Contributors $ git shortlog -sn apache-arrow-0.4.1..apache-arrow-0.5.0 42 Wes McKinney 22 Uwe [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.6.0.html b/release/0.6.0.html
index fbeeb073ac8..51ceb468ea3 100644
--- a/release/0.6.0.html
+++ b/release/0.6.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 0.6.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.6.0 (14 August 2017) This is a major release. Read more in the release blog post. Download Source Artifacts Git tag Contributors $ git shortlog -sn apache-arrow-0.5.0..apache-arrow-0.6.0 48 Wes McKinney 7 siddharth 5 Matt Darwin 5 Max Risuhin 5 Philipp Moritz 4 Kouhei Sutou 3 Bryan Cutler 2 Emilio Lahr-Vivaz 2 Li Jin 2 [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.6.0 (14 August 2017) This is a major release. Read more in the release blog post. Download Source Artifacts Git tag Contributors $ git shortlog -sn apache-arrow-0.5.0..apache-arrow-0.6.0 48 Wes McKinney 7 siddharth 5 Matt Darwin 5 Max Risuhin 5 Philipp Moritz 4 Kouhei Sutou 3 Bryan Cutler 2 Emilio Lahr-Vivaz 2 Li Jin 2 [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.7.0.html b/release/0.7.0.html
index adad3b37db9..191d3142abf 100644
--- a/release/0.7.0.html
+++ b/release/0.7.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 0.7.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.7.0 (17 September 2017) This is a major release. Read more in the release blog post. Download Source Artifacts Git tag Contributors $ git shortlog -sn apache-arrow-0.6.0..apache-arrow-0.7.0 58 Wes McKinney 14 Kouhei Sutou 11 Philipp Moritz 7 Phillip Cloud 6 siddharth 5 Uwe L. Korn 2 Bryan Cutler 2 HorimotoYasuhiro 2 La [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.7.0 (17 September 2017) This is a major release. Read more in the release blog post. Download Source Artifacts Git tag Contributors $ git shortlog -sn apache-arrow-0.6.0..apache-arrow-0.7.0 58 Wes McKinney 14 Kouhei Sutou 11 Philipp Moritz 7 Phillip Cloud 6 siddharth 5 Uwe L. Korn 2 Bryan Cutler 2 HorimotoYasuhiro 2 La [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.7.1.html b/release/0.7.1.html
index dd16d95b751..10ec5da916c 100644
--- a/release/0.7.1.html
+++ b/release/0.7.1.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 0.7.1 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.7.1 (1 October 2017) This is a minor bug release. It was motivated by ARROW-1601, but see the complete changelog. Download Source Artifacts Git tag Contributors $ git shortlog -sn apache-arrow-0.7.0..apache-arrow-0.7.1 14 Wes McKinney 6 Kouhei Sutou 3 siddharth 2 Paul Taylor 2 Uwe L. Korn 1 Amir Malekpour 1 Bryan Cutle [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.7.1 (1 October 2017) This is a minor bug release. It was motivated by ARROW-1601, but see the complete changelog. Download Source Artifacts Git tag Contributors $ git shortlog -sn apache-arrow-0.7.0..apache-arrow-0.7.1 14 Wes McKinney 6 Kouhei Sutou 3 siddharth 2 Paul Taylor 2 Uwe L. Korn 1 Amir Malekpour 1 Bryan Cutle [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.8.0.html b/release/0.8.0.html
index 216a472ecc6..02908681fdf 100644
--- a/release/0.8.0.html
+++ b/release/0.8.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 0.8.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.8.0 (18 December 2017) This is a major release. Download Source Artifacts Git tag Contributors $ git shortlog -sn apache-arrow-0.7.1..apache-arrow-0.8.0 90 Wes McKinney 23 Phillip Cloud 21 Kouhei Sutou 13 Licht-T 12 Korn, Uwe 12 Philipp Moritz 12 Uwe L. Korn 10 Bryan Cutler 5 Li Jin 5 Robert Nishihara 4 Paul Taylor 4 s [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.8.0 (18 December 2017) This is a major release. Download Source Artifacts Git tag Contributors $ git shortlog -sn apache-arrow-0.7.1..apache-arrow-0.8.0 90 Wes McKinney 23 Phillip Cloud 21 Kouhei Sutou 13 Licht-T 12 Korn, Uwe 12 Philipp Moritz 12 Uwe L. Korn 10 Bryan Cutler 5 Li Jin 5 Robert Nishihara 4 Paul Taylor 4 s [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/0.9.0.html b/release/0.9.0.html
index aefb55a5cd0..67b15a4e9fe 100644
--- a/release/0.9.0.html
+++ b/release/0.9.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 0.9.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 0.9.0 (21 March 2018) This is a major release. Download Source Artifacts Git tag Contributors $ git shortlog -sn apache-arrow-0.8.0..apache-arrow-0.9.0 52 Wes McKinney 52 Antoine Pitrou 25 Uwe L. Korn 14 Paul Taylor 13 Kouhei Sutou 13 Phillip Cloud 9 Robert Nishihara 9 Korn, Uwe 9 Jim Crist 8 Brian Hulette 7 Philipp Mori [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 0.9.0 (21 March 2018) This is a major release. Download Source Artifacts Git tag Contributors $ git shortlog -sn apache-arrow-0.8.0..apache-arrow-0.9.0 52 Wes McKinney 52 Antoine Pitrou 25 Uwe L. Korn 14 Paul Taylor 13 Kouhei Sutou 13 Phillip Cloud 9 Robert Nishihara 9 Korn, Uwe 9 Jim Crist 8 Brian Hulette 7 Philipp Mori [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/1.0.0.html b/release/1.0.0.html
index 613af9718ae..097133ce401 100644
--- a/release/1.0.0.html
+++ b/release/1.0.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 1.0.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 1.0.0 (24 July 2020) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 771 commits from 100 distinct contributors. $ git shortlog -sn apache-arrow-0.17.0..apache-arrow-1.0.0  [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 1.0.0 (24 July 2020) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 771 commits from 100 distinct contributors. $ git shortlog -sn apache-arrow-0.17.0..apache-arrow-1.0.0  [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/1.0.1.html b/release/1.0.1.html
index 774a8d5e0c7..035715c79e1 100644
--- a/release/1.0.1.html
+++ b/release/1.0.1.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 1.0.1 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 1.0.1 (21 August 2020) This is a patch release addressing bugs in the 1.0.0 release. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 39 commits from 15 distinct contributors. $ git shortlog -sn apache-arrow-1.0.0..apache-arrow-1.0.1 9 Krisz [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 1.0.1 (21 August 2020) This is a patch release addressing bugs in the 1.0.0 release. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 39 commits from 15 distinct contributors. $ git shortlog -sn apache-arrow-1.0.0..apache-arrow-1.0.1 9 Krisz [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/10.0.0.html b/release/10.0.0.html
index 6871f44c383..bc2e44d0ab2 100644
--- a/release/10.0.0.html
+++ b/release/10.0.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 10.0.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 10.0.0 (26 October 2022) This is a major release covering more than 2 months of development. Download Source Artifacts Binary Artifacts For AlmaLinux For Amazon Linux For CentOS For C# For Debian For Python For Ubuntu Git tag Contributors This release includes 536 commits from 100 distinct contributors. $ git shortlog -s [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 10.0.0 (26 October 2022) This is a major release covering more than 2 months of development. Download Source Artifacts Binary Artifacts For AlmaLinux For Amazon Linux For CentOS For C# For Debian For Python For Ubuntu Git tag Contributors This release includes 536 commits from 100 distinct contributors. $ git shortlog -s [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/10.0.1.html b/release/10.0.1.html
index fe37cd78368..14d5ef45226 100644
--- a/release/10.0.1.html
+++ b/release/10.0.1.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 10.0.1 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 10.0.1 (22 November 2022) This is a patch release covering more than 1 months of development. Download Source Artifacts Binary Artifacts For AlmaLinux For Amazon Linux For CentOS For C# For Debian For Python For Ubuntu Git tag Contributors This release includes 34 commits from 15 distinct contributors. $ git shortlog -sn [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 10.0.1 (22 November 2022) This is a patch release covering more than 1 months of development. Download Source Artifacts Binary Artifacts For AlmaLinux For Amazon Linux For CentOS For C# For Debian For Python For Ubuntu Git tag Contributors This release includes 34 commits from 15 distinct contributors. $ git shortlog -sn [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/2.0.0.html b/release/2.0.0.html
index 7be08909b78..e4b375bcb94 100644
--- a/release/2.0.0.html
+++ b/release/2.0.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 2.0.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 2.0.0 (19 October 2020) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 511 commits from 81 distinct contributors. $ git shortlog -sn apache-arrow-1.0.0..apache-arrow-2.0.0 [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 2.0.0 (19 October 2020) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 511 commits from 81 distinct contributors. $ git shortlog -sn apache-arrow-1.0.0..apache-arrow-2.0.0 [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/3.0.0.html b/release/3.0.0.html
index 07a4e7fe77d..f52d2469fc3 100644
--- a/release/3.0.0.html
+++ b/release/3.0.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 3.0.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 3.0.0 (26 January 2021) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 648 commits from 106 distinct contributors. $ git shortlog -sn apache-arrow-2.0.0..apache-arrow-3.0. [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 3.0.0 (26 January 2021) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 648 commits from 106 distinct contributors. $ git shortlog -sn apache-arrow-2.0.0..apache-arrow-3.0. [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/4.0.0.html b/release/4.0.0.html
index cdef8eb2ad7..04fc173df50 100644
--- a/release/4.0.0.html
+++ b/release/4.0.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 4.0.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 4.0.0 (26 April 2021) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 719 commits from 114 distinct contributors. $ git shortlog -sn apache-arrow-3.0.0..apache-arrow-4.0.0  [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 4.0.0 (26 April 2021) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 719 commits from 114 distinct contributors. $ git shortlog -sn apache-arrow-3.0.0..apache-arrow-4.0.0  [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/4.0.1.html b/release/4.0.1.html
index dd5035ae59b..1700be0fd03 100644
--- a/release/4.0.1.html
+++ b/release/4.0.1.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 4.0.1 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 4.0.1 (26 May 2021) This is a patch release covering a month of development and addressing small but important bugs in the different implementations. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 21 commits from 13 distinct contributors.  [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 4.0.1 (26 May 2021) This is a patch release covering a month of development and addressing small but important bugs in the different implementations. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 21 commits from 13 distinct contributors.  [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/5.0.0.html b/release/5.0.0.html
index 78ded254d1a..27451c22e81 100644
--- a/release/5.0.0.html
+++ b/release/5.0.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 5.0.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 5.0.0 (29 July 2021) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 684 commits from 99 distinct contributors in 2 Arrow repositories. 77 David Li 43 Krisztián Szűcs 42 An [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 5.0.0 (29 July 2021) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 684 commits from 99 distinct contributors in 2 Arrow repositories. 77 David Li 43 Krisztián Szűcs 42 An [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/6.0.0.html b/release/6.0.0.html
index 31848f70d20..10892a5ceeb 100644
--- a/release/6.0.0.html
+++ b/release/6.0.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 6.0.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 6.0.0 (26 October 2021) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 592 commits from 88 distinct contributors. 58 David Li 56 Antoine Pitrou 46 Neal Richardson 42 Sutou [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 6.0.0 (26 October 2021) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For CentOS For Debian For Python For Ubuntu Git tag Contributors This release includes 592 commits from 88 distinct contributors. 58 David Li 56 Antoine Pitrou 46 Neal Richardson 42 Sutou [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/6.0.1.html b/release/6.0.1.html
index 60782076fb7..ab503bde884 100644
--- a/release/6.0.1.html
+++ b/release/6.0.1.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 6.0.1 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 6.0.1 (18 November 2021) This is a patch release covering more than 0 months of development. Download Source Artifacts Binary Artifacts For AlmaLinux For Amazon Linux For CentOS For C# For Debian For Python For Ubuntu Git tag Contributors This release includes 34 commits from 16 distinct contributors. $ git shortlog -sn  [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 6.0.1 (18 November 2021) This is a patch release covering more than 0 months of development. Download Source Artifacts Binary Artifacts For AlmaLinux For Amazon Linux For CentOS For C# For Debian For Python For Ubuntu Git tag Contributors This release includes 34 commits from 16 distinct contributors. $ git shortlog -sn  [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/7.0.0.html b/release/7.0.0.html
index 719ded1316d..4d55db94dd0 100644
--- a/release/7.0.0.html
+++ b/release/7.0.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 7.0.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 7.0.0 (3 February 2022) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For AlmaLinux For Amazon Linux For CentOS For C# For Debian For Python For Ubuntu Git tag Contributors This release includes 650 commits from 105 distinct contributors. $ git shortlog -sn [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 7.0.0 (3 February 2022) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For AlmaLinux For Amazon Linux For CentOS For C# For Debian For Python For Ubuntu Git tag Contributors This release includes 650 commits from 105 distinct contributors. $ git shortlog -sn [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/8.0.0.html b/release/8.0.0.html
index c85dcc21ef8..6c5542880f7 100644
--- a/release/8.0.0.html
+++ b/release/8.0.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 8.0.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 8.0.0 (6 May 2022) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For AlmaLinux For Amazon Linux For CentOS For C# For Debian For Python For Ubuntu Git tag Contributors This release includes 636 commits from 127 distinct contributors. $ git shortlog -sn apac [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 8.0.0 (6 May 2022) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For AlmaLinux For Amazon Linux For CentOS For C# For Debian For Python For Ubuntu Git tag Contributors This release includes 636 commits from 127 distinct contributors. $ git shortlog -sn apac [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/9.0.0.html b/release/9.0.0.html
index a18dcc15a25..1e68aa7ce03 100644
--- a/release/9.0.0.html
+++ b/release/9.0.0.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Apache Arrow 9.0.0 Release" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow 9.0.0 (3 August 2022) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For AlmaLinux For Amazon Linux For CentOS For C# For Debian For Python For Ubuntu Git tag Contributors This release includes 529 commits from 114 distinct contributors. $ git shortlog -sn a [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow 9.0.0 (3 August 2022) This is a major release covering more than 3 months of development. Download Source Artifacts Binary Artifacts For AlmaLinux For Amazon Linux For CentOS For C# For Debian For Python For Ubuntu Git tag Contributors This release includes 529 commits from 114 distinct contributors. $ git shortlog -sn a [...]
 <!-- End Jekyll SEO tag -->
 
 
diff --git a/release/index.html b/release/index.html
index 2a877638458..86493a333d8 100644
--- a/release/index.html
+++ b/release/index.html
@@ -20,13 +20,13 @@
 <meta property="og:site_name" content="Apache Arrow" />
 <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="og:type" content="article" />
-<meta property="article:published_time" content="2022-12-20T11:05:45-05:00" />
+<meta property="article:published_time" content="2022-12-26T16:19:40-05:00" />
 <meta name="twitter:card" content="summary_large_image" />
 <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
 <meta property="twitter:title" content="Releases" />
 <meta name="twitter:site" content="@ApacheArrow" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-20T11:05:45-05:00","datePublished":"2022-12-20T11:05:45-05:00","description":"Apache Arrow Releases Navigate to the release page for downloads and the changelog. 10.0.1 (22 November 2022) 10.0.0 (26 October 2022) 9.0.0 (3 August 2022) 8.0.0 (6 May 2022) 7.0.0 (3 February 2022) 6.0.1 (18 November 2021) 6.0.0 (26 October 2021) 5.0.0 (29 July 2021) 4.0.1 (26 May 2021) 4.0.0 (26 April 2021) 3.0.0 (26 January 2021) [...]
+{"@context":"https://schema.org","@type":"BlogPosting","dateModified":"2022-12-26T16:19:40-05:00","datePublished":"2022-12-26T16:19:40-05:00","description":"Apache Arrow Releases Navigate to the release page for downloads and the changelog. 10.0.1 (22 November 2022) 10.0.0 (26 October 2022) 9.0.0 (3 August 2022) 8.0.0 (6 May 2022) 7.0.0 (3 February 2022) 6.0.1 (18 November 2021) 6.0.0 (26 October 2021) 5.0.0 (29 July 2021) 4.0.1 (26 May 2021) 4.0.0 (26 April 2021) 3.0.0 (26 January 2021) [...]
 <!-- End Jekyll SEO tag -->