You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by ap...@apache.org on 2022/06/30 12:48:49 UTC

[arrow] branch master updated: ARROW-15130: [Docs] Add glossary (#12868)

This is an automated email from the ASF dual-hosted git repository.

apitrou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
     new 8b298be4a3 ARROW-15130: [Docs] Add glossary (#12868)
8b298be4a3 is described below

commit 8b298be4a3ebcf8655326e44a0d2ca6dd4eb254c
Author: David Li <li...@gmail.com>
AuthorDate: Thu Jun 30 08:48:42 2022 -0400

    ARROW-15130: [Docs] Add glossary (#12868)
    
    Authored-by: David Li <li...@gmail.com>
    Signed-off-by: Antoine Pitrou <an...@python.org>
---
 docs/source/format/Glossary.rst | 202 ++++++++++++++++++++++++++++++++++++++++
 docs/source/index.rst           |   1 +
 2 files changed, 203 insertions(+)

diff --git a/docs/source/format/Glossary.rst b/docs/source/format/Glossary.rst
new file mode 100644
index 0000000000..423ebf8578
--- /dev/null
+++ b/docs/source/format/Glossary.rst
@@ -0,0 +1,202 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+========
+Glossary
+========
+
+.. glossary::
+   :sorted:
+
+   array
+   vector
+       A *contiguous*, *one-dimensional* sequence of values with known
+       length where all values have the same type.  An array consists
+       of zero or more :term:`buffers <buffer>`, a non-negative
+       length, and a :term:`data type`.  The buffers of an array are
+       laid out according to the data type as defined by the columnar
+       format.
+
+       Arrays are contiguous in the sense that iterating the values of
+       an array will iterate through a single set of buffers, even
+       though an array may consist of multiple disjoint buffers, or
+       may consist of child arrays that themselves span multiple
+       buffers.
+
+       Arrays are one-dimensional in that they are a sequence of
+       :term:`slots <slot>` or singular values, even though for some
+       data types (like structs or unions), a slot may represent
+       multiple values.
+
+       Defined by the :doc:`./Columnar`.
+
+   buffer
+       A *contiguous* region of memory with a given length.  Buffers
+       are used to store data for arrays.
+
+       Buffers may be in CPU memory, memory-mapped from a file, in
+       device (e.g. GPU) memory, etc., though not all Arrow
+       implementations support all of these possibilities.
+
+   child array
+   parent array
+       In an array of a :term:`nested type`, the parent array
+       corresponds to the :term:`parent type` and the child array(s)
+       correspond to the :term:`child type(s) <child type>`.  For
+       example, a ``List[Int32]``-type parent array has an
+       ``Int32``-type child array.
+
+   child type
+   parent type
+       In a :term:`nested type`, the nested type is the parent type,
+       and the child type(s) are its parameters.  For example, in
+       ``List[Int32]``, ``List`` is the parent type and ``Int32`` is
+       the child type.
+
+   chunked array
+       A *discontiguous*, *one-dimensional* sequence of values with
+       known length where all values have the same type.  Consists of
+       zero or more :term:`arrays <array>`, the "chunks".
+
+       Chunked arrays are discontiguous in the sense that iterating
+       the values of a chunked array may require iterating through
+       different buffers for different indices.
+
+       Not part of the columnar format; this term is specific to
+       certain language implementations of Arrow (primarily C++ and
+       its bindings).
+
+       .. seealso:: :term:`record batch`, :term:`table`
+
+   complex type
+   nested type
+       A :term:`data type` whose structure depends on one or more
+       other :term:`child data types <child type>`. For instance,
+       ``List`` is a nested type that has one child.
+
+       Two nested types are equal if and only if their child types are
+       also equal.
+
+   data type
+   type
+       A type that a value can take, such as ``Int8`` or
+       ``List[Utf8]``. The type of an array determines how its values
+       are laid out in memory according to :doc:`./Columnar`.
+
+       .. seealso:: :term:`nested type`, :term:`primitive type`
+
+   dictionary
+       An array of values that accompany a :term:`dictionary-encoded
+       <dictionary-encoding>` array.
+
+   dictionary-encoding
+       An array that stores its values as indices into a
+       :term:`dictionary` array instead of storing the values
+       directly.
+
+       .. seealso:: :ref:`dictionary-encoded-layout`
+
+   extension type
+   storage type
+       A user-defined :term:`data type` that adds additional semantics
+       to an existing data type.  This allows implementations that do
+       not support a particular extension type to still handle the
+       underlying data type (the "storage type").
+
+       For example, a UUID can be represented as a 16-byte fixed-size
+       binary type.
+
+       .. seealso:: :ref:`format_metadata_extension_types`
+
+   field
+       A column in a :term:`schema`.  Consists of a field name, a
+       :term:`data type`, a flag indicating whether the field is
+       nullable or not, and optional key-value metadata.
+
+   IPC format
+       A specification for how to serialize Arrow data, so it can be
+       sent between processes/machines, or persisted on disk.
+
+       .. seealso:: :term:`IPC file format`,
+                    :term:`IPC streaming format`
+
+   IPC file format
+   file format
+   random-access format
+       An extension of the :term:`IPC streaming format` that can be
+       used to serialize Arrow data to disk, then read it back with
+       random access to individual record batches.
+
+   IPC message
+   message
+       The IPC representation of a particular in-memory structure,
+       like a record batch or schema.
+
+   IPC streaming format
+   streaming format
+       A protocol for streaming Arrow data or for serializing data to
+       a file, consisting of a stream of :term:`IPC messages <IPC
+       message>`.
+
+   physical layout
+       A specification for how to arrange values in memory.
+
+       .. seealso:: :ref:`format_layout`
+
+   primitive type
+       A data type that does not have any child types.
+
+       .. seealso:: :term:`data type`
+
+   record batch
+       **In the :ref:`IPC format <format-ipc>`**: the primitive unit
+       of data.  A record batch consists of an ordered list of
+       :term:`buffers <buffer>` corresponding to a :term:`schema`.
+
+       **In some implementations** (primarily C++ and its bindings): a
+       *contiguous*, *two-dimensional* chunk of data.  A record batch
+       consists of an ordered collection of :term:`arrays <array>` of
+       the same length.
+
+       Like arrays, record batches are contiguous in the sense that
+       iterating the rows of a record batch will iterate through a
+       single set of buffers.
+
+   schema
+       A collection of :term:`fields <field>` with optional metadata
+       that determines all the :term:`data types <data type>` of an
+       object like a :term:`record batch` or :term:`table`.
+
+   slot
+       A single logical value within an array, i.e. a "row".
+
+   table
+       A *discontiguous*, *two-dimensional* chunk of data consisting
+       of an ordered collection of :term:`chunked arrays <chunked
+       array>`.  All chunked arrays have the same length, but may have
+       different types.  Different columns may be chunked
+       differently.
+
+       Like chunked arrays, tables are discontiguous in the sense that
+       iterating the rows of a table may require iterating through
+       different buffers for different indices.
+
+       Not part of the columnar format; this term is specific to
+       certain language implementations of Arrow (primarily C++ and
+       its bindings).
+
+       .. seealso:: :term:`chunked array`, :term:`record batch`
diff --git a/docs/source/index.rst b/docs/source/index.rst
index f8d1e71a4f..b3d232fbb8 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -80,6 +80,7 @@ target environment.**
    format/CDataInterface
    format/CStreamInterface
    format/Other
+   format/Glossary
 
 .. _toc.development: