You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "David Lee (JIRA)" <ji...@apache.org> on 2018/12/14 19:32:00 UTC
[jira] [Created] (ARROW-4032) [Python] New
pyarrow.Table.from_pydict() function
David Lee created ARROW-4032:
--------------------------------
Summary: [Python] New pyarrow.Table.from_pydict() function
Key: ARROW-4032
URL: https://issues.apache.org/jira/browse/ARROW-4032
Project: Apache Arrow
Issue Type: Task
Components: Python
Reporter: David Lee
Here's a proposal to create a pyarrow.Table.from_pydict() function.
Right now only pyarrow.Table.from_pandas() exist and there are inherit problems using Pandas with NULL support for Int(s) and Boolean(s)
[http://pandas.pydata.org/pandas-docs/version/0.23.4/gotchas.html]
{{NaN}}, Integer {{NA}} values and {{NA}} type promotions:
Sample python code on how this would work.
{code:java}
import pyarrow as pa
from datetime import datetime
pylist = [
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5, "city": "San Francisco"},
{"name": "Pam", "age": 7, "birthday": datetime.now()}
]
def from_pydict(pylist, columns):
arrow_columns = list()
for column in columns:
arrow_columns.append(pa.array([v[column] if column in v else None for v in pylist]))
arrow_table = pa.Table.from_arrays(arrow_columns, columns)
return arrow_table
test = from_pydict(pylist, ['name' , 'age', 'city', 'birthday', 'dummy'])
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)