You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by th...@apache.org on 2021/09/15 11:25:13 UTC

[arrow-cookbook] branch main updated: Specifying schemas for arrays and tables (#73)

This is an automated email from the ASF dual-hosted git repository.

thisisnic pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git


The following commit(s) were added to refs/heads/main by this push:
     new a60dd10  Specifying schemas for arrays and tables (#73)
a60dd10 is described below

commit a60dd100bd2210d9e00fd6d924342979b7a9d392
Author: Alessandro Molina <am...@turbogears.org>
AuthorDate: Wed Sep 15 13:25:08 2021 +0200

    Specifying schemas for arrays and tables (#73)
---
 python/source/index.rst  |   1 +
 python/source/schema.rst | 111 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 112 insertions(+)

diff --git a/python/source/index.rst b/python/source/index.rst
index ff4ae7c..9fe680c 100644
--- a/python/source/index.rst
+++ b/python/source/index.rst
@@ -17,6 +17,7 @@ serve as robust and well performing solutions to those tasks.
 
    io
    create
+   schema
    data
 
 Indices and tables
diff --git a/python/source/schema.rst b/python/source/schema.rst
new file mode 100644
index 0000000..c3cb009
--- /dev/null
+++ b/python/source/schema.rst
@@ -0,0 +1,111 @@
+===================
+Working with Schema
+===================
+
+Arrow automatically infers the most appropriate data type when reading in data
+or converting Python objects to Arrow objects.  
+
+However, you might want to manually tell Arrow which data types to 
+use, for example, to ensure interoperability with databases and data warehouse 
+systems.  This chapter includes recipes for dealing with schemas.
+
+.. contents::
+
+Setting the data type of an Arrow Array
+=======================================
+
+If you have an existing array and want to change its data type,
+that can be done through the ``cast`` function:
+
+.. testcode::
+
+    import pyarrow as pa
+
+    arr = pa.array([1, 2, 3, 4, 5])
+    print(arr.type)
+
+.. testoutput::
+
+    int64
+
+.. testcode::
+
+    arr = arr.cast(pa.int8())
+    print(arr.type)
+
+.. testoutput::
+
+    int8
+
+You can also create an array of the requested type by providing
+the type at array creation
+
+.. testcode::
+
+    import pyarrow as pa
+
+    arr = pa.array([1, 2, 3, 4, 5], type=pa.int8())
+    print(arr.type)
+
+.. testoutput::
+
+    int8
+
+Setting the schema of a Table
+=============================
+
+Tables detain multiple columns, each with its own name
+and type. The union of types and names is what defines a schema.
+
+A schema in Arrow can be defined using :meth:`pyarrow.schema`
+
+.. testcode::
+
+    import pyarrow as pa
+
+    schema = pa.schema([
+        ("col1", pa.int8()),
+        ("col2", pa.string()),
+        ("col3", pa.float64())
+    ])
+
+The schema can then be provided to a table when created:
+
+.. testcode::
+
+    table = pa.table([
+        [1, 2, 3, 4, 5],
+        ["a", "b", "c", "d", "e"],
+        [1.0, 2.0, 3.0, 4.0, 5.0]
+    ], schema=schema)
+
+    print(table)
+
+.. testoutput::
+
+    pyarrow.Table
+    col1: int8
+    col2: string
+    col3: double
+
+Like for arrays, it's possible to cast tables to different schemas
+as far as they are compatible
+
+.. testcode::
+
+    schema_int32 = pa.schema([
+        ("col1", pa.int32()),
+        ("col2", pa.string()),
+        ("col3", pa.float64())
+    ])
+
+    table = table.cast(schema_int32)
+
+    print(table)
+
+.. testoutput::
+
+    pyarrow.Table
+    col1: int32
+    col2: string
+    col3: double
\ No newline at end of file