You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2021/05/03 11:57:00 UTC

[jira] [Commented] (ARROW-12588) Expose JSON schema inference to Python API

    [ https://issues.apache.org/jira/browse/ARROW-12588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338332#comment-17338332 ] 

Joris Van den Bossche commented on ARROW-12588:
-----------------------------------------------

Can you give a concrete example?

Some level of schema inference also happens in the general {{pa.array()}} constructor. For example, passing a list of dicts works in simple cases:

{code}
In [2]: arr = pa.array([{'a': 1, 'b': 2}, {'a': 3, 'b': 4}])

In [3]: arr.type
Out[3]: StructType(struct<a: int64, b: int64>)

In [4]: arr
Out[4]: 
<pyarrow.lib.StructArray object at 0x7f160695d4c0>
-- is_valid: all not null
-- child 0 type: int64
  [
    1,
    3
  ]
-- child 1 type: int64
  [
    2,
    4
  ]

{code}

> Expose JSON schema inference to Python API
> ------------------------------------------
>
>                 Key: ARROW-12588
>                 URL: https://issues.apache.org/jira/browse/ARROW-12588
>             Project: Apache Arrow
>          Issue Type: Improvement
>            Reporter: Piotr Żelasko
>            Priority: Minor
>
> When using `pyarrow.json.read_json()`, the schema is automatically inferred. It would be useful to infer the schema from a json that is already loaded in memory (i.e. possibly a list of dicts in Python).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)