You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2022/03/19 13:40:00 UTC
[jira] [Updated] (ARROW-15978) [C++] Have JSON reader treat mixed singleton/array fields as arrays.
[ https://issues.apache.org/jira/browse/ARROW-15978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Li updated ARROW-15978:
-----------------------------
Summary: [C++] Have JSON reader treat mixed singleton/array fields as arrays. (was: Have JSON reader treat mixed singleton/array fields as arrays.)
> [C++] Have JSON reader treat mixed singleton/array fields as arrays.
> --------------------------------------------------------------------
>
> Key: ARROW-15978
> URL: https://issues.apache.org/jira/browse/ARROW-15978
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++, Python, R
> Affects Versions: 7.0.0
> Reporter: Ben Schmidt
> Priority: Minor
>
> I frequently encounter real-world files that mix array types and singletons across entries for a single field. For example, consider an ndjson file consisting of:
>
> {code:java}
> {"author": "Hunter Thompson"}
> {"author": ["Bob Woodward", "Carl Bernstein"]} {code}
> Widely used specs promote writing JSON like this, where a singleton isn't be wrapped in array brackets. For example, the 'target' field in the w3 annotation model [may be a string or an array of strings.|https://www.w3.org/TR/annotation-model/#:~:text=The%20body%20and/or%20target%20relationships%20of%20the%20Annotation%20may%20be%20arrays%20rather%20than%20a%20single%20object.]
>
> Currently I see no way to read this sort of data with the C++ json reader. It would be nice if arrow's ndjson reader could do two things to support data like this.
> # When inferring types, silently promote entries of type <T> to type <T[]> if the column is mixed;
> # When passed an explicit schema that includes a ListArray, promote all instances of the field to an array if they aren't already.
> My sense is that this might be pretty simple.
> Thanks to everyone who works on this project.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)