You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/19 10:03:46 UTC

[GitHub] [arrow] jorisvandenbossche commented on issue #12885: pyarrow.json.read_json with a type inconsistent column possible?

jorisvandenbossche commented on issue #12885:
URL: https://github.com/apache/arrow/issues/12885#issuecomment-1102439997

   It might be useful to include a small actual example with some dummy data and explain how you would want to read that.
   
   Now, in general, I don't think we have much support for mixed types in JSON. In theory, those could also be read as "union" types (although for writing to parquet as a next step that's not so useful, since parquet doesn't support this)
   
   > Is it possible to still keep the default inference behaviour while also ignoring or casting values that don't conform?
   
   It's not fully clear to me what you exactly meaning with "ignoring" in this case. Do you mean ignoring certain values in a column, or ignoring a full column? And if ignoring certain values, what should they be replaced with? (null?) 
   
   > Or, casting I mean casting to higher class of the json data type, for example int casts to str in this case.
   
   Maybe we could enable this by letting the user specify an explicit schema (as str in this case), although that currently will also error if then a value doesn't match the specified type. 
   
   From a quick search, someone else reported a similar JIRA about this: https://issues.apache.org/jira/browse/ARROW-11978
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org