You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/14 13:10:12 UTC

[GitHub] [arrow] hugofragata opened a new issue, #12885: pyarrow.json.read_json with a type inconsistent column possible?

hugofragata opened a new issue, #12885:
URL: https://github.com/apache/arrow/issues/12885

   Hi all.
   
   My current use case for pyarrow is to read json and write the data as parquet.
   I'm having issues on a specific json column that contains both integer and string values.
   
   I could use pyarrow.json.ParseOptions and pass it an explicit_schema, however there's several schemas that have this issue and I'd like a single solution to all of them.
   
   Is it possible to still keep the default inference behaviour while also ignoring or casting values that don't conform?
   By ignoring I'm mean similar to unexpected_field_behavior:ignore but w/o an explicit_schema. Or, casting I mean casting to higher class of the json data type, for example int casts to str in this case.
   
   Cheers!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] hugofragata closed issue #12885: pyarrow.json.read_json with a type inconsistent column possible?

Posted by GitBox <gi...@apache.org>.
hugofragata closed issue #12885: pyarrow.json.read_json with a type inconsistent column possible?
URL: https://github.com/apache/arrow/issues/12885


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #12885: pyarrow.json.read_json with a type inconsistent column possible?

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on issue #12885:
URL: https://github.com/apache/arrow/issues/12885#issuecomment-1102439997

   It might be useful to include a small actual example with some dummy data and explain how you would want to read that.
   
   Now, in general, I don't think we have much support for mixed types in JSON. In theory, those could also be read as "union" types (although for writing to parquet as a next step that's not so useful, since parquet doesn't support this)
   
   > Is it possible to still keep the default inference behaviour while also ignoring or casting values that don't conform?
   
   It's not fully clear to me what you exactly meaning with "ignoring" in this case. Do you mean ignoring certain values in a column, or ignoring a full column? And if ignoring certain values, what should they be replaced with? (null?) 
   
   > Or, casting I mean casting to higher class of the json data type, for example int casts to str in this case.
   
   Maybe we could enable this by letting the user specify an explicit schema (as str in this case), although that currently will also error if then a value doesn't match the specified type. 
   
   From a quick search, someone else reported a similar JIRA about this: https://issues.apache.org/jira/browse/ARROW-11978
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] hugofragata commented on issue #12885: pyarrow.json.read_json with a type inconsistent column possible?

Posted by GitBox <gi...@apache.org>.
hugofragata commented on issue #12885:
URL: https://github.com/apache/arrow/issues/12885#issuecomment-1140957216

   Yes, I do agree with your responses that the intended outcome for this issue is not clear. I will close this issue. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #12885: pyarrow.json.read_json with a type inconsistent column possible?

Posted by GitBox <gi...@apache.org>.
pitrou commented on issue #12885:
URL: https://github.com/apache/arrow/issues/12885#issuecomment-1131943054

   > Is it possible to still keep the default inference behaviour while also ignoring or casting values that don't conform?
   
   That doesn't seem to make much sense to me. If ignoring non-conforming values, then inference might choose any type it wants? (including bool or null?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org