You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/21 19:36:20 UTC

[GitHub] [arrow-cookbook] drabastomek commented on a change in pull request #1: Initial content for Arrow Cookbook for Python and R

drabastomek commented on a change in pull request #1:
URL: https://github.com/apache/arrow-cookbook/pull/1#discussion_r674256677



##########
File path: python/source/data.rst
##########
@@ -0,0 +1,139 @@
+=================
+Data Manipulation
+=================
+
+Recipes related to filtering or transforming data in
+arrays and tables.
+
+.. contents::
+
+See :ref:`compute` for a complete list of all available compute functions
+
+Computing Mean/Min/Max values of an array
+=========================================
+
+Arrow provides compute functions that can be applied to arrays.
+Those compute functions are exposed through the :mod:`arrow.compute`
+module.
+
+.. testsetup::
+
+  import numpy as np
+  import pyarrow as pa
+
+  arr = pa.array(np.arange(100))
+
+Given an array with all numbers from 0 to 100
+
+.. testcode::
+
+  print(f"{arr[0]} .. {arr[-1]}")
+
+.. testoutput::
+
+  0 .. 99
+
+We can compute the ``mean`` using the :func:`arrow.compute.mean`
+function
+
+.. testcode::
+
+  import pyarrow.compute as pc
+
+  mean = pc.mean(arr)
+  print(mean)
+
+.. testoutput::
+
+  49.5
+
+And the ``min`` and ``max`` using the :func:`arrow.compute.min_max`
+function
+
+.. testcode::
+
+  import pyarrow.compute as pc
+
+  min_max = pc.min_max(arr)
+  print(min_max)
+
+.. testoutput::
+
+  {'min': 0, 'max': 99}
+
+Counting Occurrences of Elements
+================================
+
+Arrow provides compute functions that can be applied to arrays,
+those compute functions are exposed through the :mod:`arrow.compute`
+module.
+
+.. testsetup::
+
+  import pyarrow as pa
+
+  nums_arr = pa.array(list(range(10))*10)
+
+Given an array with all numbers from 0 to 10 repeated 10 times

Review comment:
       Similar comment here: we have 10 numbers from 0 to 9 rather than all numbers between 0 and 10.

##########
File path: python/source/data.rst
##########
@@ -0,0 +1,139 @@
+=================
+Data Manipulation
+=================
+
+Recipes related to filtering or transforming data in
+arrays and tables.
+
+.. contents::
+
+See :ref:`compute` for a complete list of all available compute functions
+
+Computing Mean/Min/Max values of an array
+=========================================
+
+Arrow provides compute functions that can be applied to arrays.
+Those compute functions are exposed through the :mod:`arrow.compute`
+module.
+
+.. testsetup::
+
+  import numpy as np
+  import pyarrow as pa
+
+  arr = pa.array(np.arange(100))
+
+Given an array with all numbers from 0 to 100

Review comment:
       We should change this as we have 100 numbers from 0 - 99 (100) while this sentence might lead someone to think these include all the numbers from 0 to 100 (inclusive) i.e. 101 numbers.

##########
File path: python/source/io.rst
##########
@@ -0,0 +1,424 @@
+========================
+Reading and Writing Data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Write a Parquet file
+====================
+
+.. testsetup::
+
+    import numpy as np
+    import pyarrow as pa
+
+    arr = pa.array(np.arange(100))
+
+Given an array with all numbers from 0 to 100
+
+.. testcode::
+
+    print(f"{arr[0]} .. {arr[-1]}")
+
+.. testoutput::
+
+    0 .. 99
+
+To write it to a Parquet file, as Parquet is a columnar format,
+we must create a :class:`pyarrow.Table` out of it,
+so that we get a table of a single column which can then be
+written to a Parquet file. 
+
+.. testcode::
+
+    table = pa.Table.from_arrays([arr], names=["col1"])
+
+Once we have a table, it can be written to a Parquet File 
+using the functions provided by the ``pyarrow.parquet`` module
+
+.. testcode::
+
+    import pyarrow.parquet as pq
+
+    pq.write_table(table, "example.parquet", compression=None)
+
+Reading a Parquet file
+======================
+
+Given a Parquet file, it can be read back to a :class:`pyarrow.Table`
+by using :func:`pyarrow.parquet.read_table` function
+
+.. testcode::
+
+    import pyarrow.parquet as pq
+
+    table = pq.read_table("example.parquet")
+
+The resulting table will contain the same columns that existed in
+the parquet file as :class:`ChunkedArray`
+
+.. testcode::
+
+    print(table)
+
+    col1 = table["col1"]
+    print(f"{type(col1).__name__} = {col1[0]} .. {col1[-1]}")
+
+.. testoutput::
+
+    pyarrow.Table
+    col1: int64
+    ChunkedArray = 0 .. 99
+
+Reading a subset of Parquet data
+================================
+
+When reading a Parquet file with :func:`pyarrow.parquet.read_table` 
+it is possible to restrict which Columns and Rows will be read
+into memory by using the ``filters`` and ``columns`` arguments
+
+.. testcode::
+
+    import pyarrow.parquet as pq
+
+    table = pq.read_table("example.parquet", 
+                          columns=["col1"],
+                          filters=[
+                              ("col1", ">", 5),
+                              ("col1", "<", 10),
+                          ])
+
+The resulting table will contain only the projected columns
+and filtered rows. Refer to :func:`pyarrow.parquet.read_table`
+documentation for details about the syntax for filters.
+
+.. testcode::
+
+    print(table)
+
+    col1 = table["col1"]
+    print(f"{type(col1).__name__} = {col1[0]} .. {col1[-1]}")
+
+.. testoutput::
+
+    pyarrow.Table
+    col1: int64
+    ChunkedArray = 6 .. 9
+    
+
+Saving Arrow Arrays to disk
+===========================
+
+Apart from using arrow to read and save common file formats like Parquet,
+it is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format.
+
+Given an array with all numbers from 0 to 100

Review comment:
       I'd change this to read: "Given the first 100 numbers starting from 0"

##########
File path: python/source/data.rst
##########
@@ -0,0 +1,139 @@
+=================
+Data Manipulation
+=================
+
+Recipes related to filtering or transforming data in
+arrays and tables.
+
+.. contents::
+
+See :ref:`compute` for a complete list of all available compute functions
+
+Computing Mean/Min/Max values of an array
+=========================================
+
+Arrow provides compute functions that can be applied to arrays.
+Those compute functions are exposed through the :mod:`arrow.compute`
+module.
+
+.. testsetup::
+
+  import numpy as np
+  import pyarrow as pa
+
+  arr = pa.array(np.arange(100))
+
+Given an array with all numbers from 0 to 100
+
+.. testcode::
+
+  print(f"{arr[0]} .. {arr[-1]}")
+
+.. testoutput::
+
+  0 .. 99
+
+We can compute the ``mean`` using the :func:`arrow.compute.mean`
+function
+
+.. testcode::
+
+  import pyarrow.compute as pc
+
+  mean = pc.mean(arr)
+  print(mean)
+
+.. testoutput::
+
+  49.5
+
+And the ``min`` and ``max`` using the :func:`arrow.compute.min_max`
+function
+
+.. testcode::
+
+  import pyarrow.compute as pc
+
+  min_max = pc.min_max(arr)
+  print(min_max)
+
+.. testoutput::
+
+  {'min': 0, 'max': 99}
+
+Counting Occurrences of Elements
+================================
+
+Arrow provides compute functions that can be applied to arrays,
+those compute functions are exposed through the :mod:`arrow.compute`
+module.
+
+.. testsetup::
+
+  import pyarrow as pa
+
+  nums_arr = pa.array(list(range(10))*10)
+
+Given an array with all numbers from 0 to 10 repeated 10 times
+
+.. testcode::
+
+  print(f"LEN: {len(nums_arr)}, MIN/MAX: {nums_arr[0]} .. {nums_arr[-1]}")
+
+.. testoutput::
+
+  LEN: 100, MIN/MAX: 0 .. 9
+
+We can count occurences of all entries in the array using the
+:func:`arrow.compute.value_counts` function
+
+.. testcode::
+
+  import pyarrow.compute as pc
+
+  counts = pc.value_counts(nums_arr)
+  for pair in counts:
+    print(pair)
+
+.. testoutput::
+
+  {'values': 0, 'counts': 10}
+  {'values': 1, 'counts': 10}
+  {'values': 2, 'counts': 10}
+  {'values': 3, 'counts': 10}
+  {'values': 4, 'counts': 10}
+  {'values': 5, 'counts': 10}
+  {'values': 6, 'counts': 10}
+  {'values': 7, 'counts': 10}
+  {'values': 8, 'counts': 10}
+  {'values': 9, 'counts': 10}
+
+Applying arithmetic functions to arrays.
+=========================================
+
+The compute functions in :mod:`arrow.compute` also include
+common transformations such as arithmetic functions.
+
+Given an array with all numbers from 0 to 100

Review comment:
       Same here: 0 .. 99 so 100 numbers.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org