You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by we...@apache.org on 2019/06/26 00:58:27 UTC

[arrow] branch master updated: ARROW-4847: [Python] Add pyarrow.table factory function

This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
     new ff78d30  ARROW-4847: [Python] Add pyarrow.table factory function
ff78d30 is described below

commit ff78d30a65dd834a6518181a31f0e4a75d327d8b
Author: Joris Van den Bossche <jo...@gmail.com>
AuthorDate: Tue Jun 25 19:58:19 2019 -0500

    ARROW-4847: [Python] Add pyarrow.table factory function
    
    Rudimentary implementation for https://issues.apache.org/jira/browse/ARROW-4847.
    
    For now I made the minimal choices:
    
    * Which formats to support? For now I only included DataFrame and dictionary, and not list of arrays or list of batches, because DataFrame/dictionary are most unambiguous and don't require additional keywords to be interpreted (eg list of arrays needs a list of names (or the schema)).
      `from_batches` could probably easily be supported (eg by checking the first element of the list being a RecordBatch), but then it might also be nice to support a list of DataFrames to be interpreted as a list of batches?
    
    * Which keywords to expose? For now I only included the keywords that are common to the different constructors being dispatched to (in practice only `schema`), in the idea you can always use the specialized constructors if needing more control.
    
    Author: Joris Van den Bossche <jo...@gmail.com>
    Author: Wes McKinney <we...@apache.org>
    
    Closes #4601 from jorisvandenbossche/ARROW-4847-table-factory and squashes the following commits:
    
    790c342a1 <Wes McKinney> Change ValueError to TypeError
    aa5171203 <Joris Van den Bossche> Merge remote-tracking branch 'upstream/master' into ARROW-4847-table-factory
    69ab78a04 <Joris Van den Bossche> ARROW-4847:  Add pyarrow.table factory function
---
 docs/source/python/api/tables.rst  |  1 +
 python/pyarrow/__init__.py         |  2 +-
 python/pyarrow/table.pxi           | 29 +++++++++++++++++++++++++++++
 python/pyarrow/tests/test_table.py | 23 +++++++++++++++++++++++
 4 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/docs/source/python/api/tables.rst b/docs/source/python/api/tables.rst
index 5a229d2..9d350a4 100644
--- a/docs/source/python/api/tables.rst
+++ b/docs/source/python/api/tables.rst
@@ -28,6 +28,7 @@ Factory Functions
 .. autosummary::
    :toctree: ../generated/
 
+   table
    column
    chunked_array
    concat_tables
diff --git a/python/pyarrow/__init__.py b/python/pyarrow/__init__.py
index e4d7446..487065c 100644
--- a/python/pyarrow/__init__.py
+++ b/python/pyarrow/__init__.py
@@ -65,7 +65,7 @@ from pyarrow.lib import (null, bool_,
                          Schema,
                          schema,
                          Array, Tensor,
-                         array, chunked_array, column,
+                         array, chunked_array, column, table,
                          infer_type, from_numpy_dtype,
                          NullArray,
                          NumericArray, IntegerArray, FloatingPointArray,
diff --git a/python/pyarrow/table.pxi b/python/pyarrow/table.pxi
index 688050b..3f08e8d 100644
--- a/python/pyarrow/table.pxi
+++ b/python/pyarrow/table.pxi
@@ -1576,6 +1576,35 @@ def _reconstruct_table(arrays, schema):
     return Table.from_arrays(arrays, schema=schema)
 
 
+def table(data, schema=None):
+    """
+    Create a pyarrow.Table from a Python object (table like objects such as
+    DataFrame, dictionary).
+
+    Parameters
+    ----------
+    data : pandas.DataFrame, dict
+        A DataFrame or a mapping of strings to Arrays or Python lists.
+    schema : Schema, default None
+        The expected schema of the Arrow Table. If not passed, will be
+        inferred from the data.
+
+    Returns
+    -------
+    Table
+
+    See Also
+    --------
+    Table.from_pandas, Table.from_pydict
+    """
+    if isinstance(data, dict):
+        return Table.from_pydict(data, schema=schema)
+    elif isinstance(data, _pandas_api.pd.DataFrame):
+        return Table.from_pandas(data, schema=schema)
+    else:
+        return TypeError("Expected pandas DataFrame or python dictionary")
+
+
 def concat_tables(tables):
     """
     Perform zero-copy concatenation of pyarrow.Table objects. Raises exception
diff --git a/python/pyarrow/tests/test_table.py b/python/pyarrow/tests/test_table.py
index c7bb3f5..7106a3f 100644
--- a/python/pyarrow/tests/test_table.py
+++ b/python/pyarrow/tests/test_table.py
@@ -992,3 +992,26 @@ def test_table_from_pydict():
     # Cannot pass both schema and metadata
     with pytest.raises(ValueError):
         pa.Table.from_pydict(data, schema=schema, metadata=metadata)
+
+
+@pytest.mark.pandas
+def test_table_factory_function():
+    import pandas as pd
+
+    d = {'a': [1, 2, 3], 'b': ['a', 'b', 'c']}
+    schema = pa.schema([('a', pa.int32()), ('b', pa.string())])
+
+    df = pd.DataFrame(d)
+    table1 = pa.table(df)
+    table2 = pa.Table.from_pandas(df)
+    assert table1.equals(table2)
+    table1 = pa.table(df, schema=schema)
+    table2 = pa.Table.from_pandas(df, schema=schema)
+    assert table1.equals(table2)
+
+    table1 = pa.table(d)
+    table2 = pa.Table.from_pydict(d)
+    assert table1.equals(table2)
+    table1 = pa.table(d, schema=schema)
+    table2 = pa.Table.from_pydict(d, schema=schema)
+    assert table1.equals(table2)