You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andy Grove (Jira)" <ji...@apache.org> on 2021/02/24 01:52:03 UTC

[jira] [Commented] (ARROW-10118) [Rust] [DataFusion] Add support for JSON data sources

    [ https://issues.apache.org/jira/browse/ARROW-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289465#comment-17289465 ] 

Andy Grove commented on ARROW-10118:
------------------------------------

Well, we could add schema inference but it could be slow for large JSON
files especially where the schema varies between objects and where there
are nested structs with varying schemas.

Maybe there are two different stories here.

1) Support JSON using schema inference

2) Support JSON in a schemaless way. For example, if I run "SELECT a, b,
c.d.e.f ..." I would expect to get NULLs for any of these attributes that
do not exist on any particular row.

On Fri, Nov 27, 2020 at 4:00 PM Neville Dipale (Jira) <ji...@apache.org>



> [Rust] [DataFusion] Add support for JSON data sources
> -----------------------------------------------------
>
>                 Key: ARROW-10118
>                 URL: https://issues.apache.org/jira/browse/ARROW-10118
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Rust, Rust - DataFusion
>            Reporter: Andy Grove
>            Priority: Major
>
> Arrow already has a JSON reader and it would be nice to integrate this with DataFusion so that queries can be run against JSON files.
> This would probably not be trivial though since we would need to add support for schemaless data sources (it isn't practical to parse the JSON files first to extract the schema).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)