You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "benibus (via GitHub)" <gi...@apache.org> on 2023/04/25 22:10:36 UTC

[GitHub] [arrow] benibus commented on issue #35096: [Python] The parse_options parameter newlines_in_values doesn't work when reading JSON

benibus commented on issue #35096:
URL: https://github.com/apache/arrow/issues/35096#issuecomment-1522485993

   Sorry for the delay. `newlines_in_values` shouldn't actually affect the resulting table. It mostly serves as a warning to the reader that the source's JSON objects can't be reliably delimited by raw newlines - so a more expensive chunking path is taken prior to each chunk being parsed individually. Otherwise, parsing errors are very likely.
   
   In your case, when `newlines_in_values=false`, you would get an error if you set `ReadOptions::block_size` to 64 (where the file size is 120). However, it would work just fine with `newlines_in_values=true`. 
   
   That being said, I'm not entirely sure why `newlines_in_values` isn't in `ReadOptions` instead. Looking at the C++ implementation, the option doesn't appear to be used by the parser at all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org