You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/01 13:36:07 UTC

[GitHub] [arrow] nevi-me commented on pull request #7309: ARROW-8993: [Rust] support reading gzipped json files

nevi-me commented on pull request #7309:
URL: https://github.com/apache/arrow/pull/7309#issuecomment-636864911

> I agree that placing the burden on the user is a bad idea. However, there are situations where we just can't seek back to start (s3 is one example). Maybe a specific implementation for `Seek + Read`, that would do the seek back to start, and one for `Read` only, that would not. However... this would need the use of specialization, so more nightly dependencies.

Okay, in that case I could support not seeking back to the start of the input. One downside though is that in the case where no schema is supplied, and the readers (csv and json) infer the schema, we do need to reset the input to its starting position. I've briefly looked at the csv code, so if it's doable, we could find a solution.

Regarding specialization, we are already dependent on it, with low likelihood of this changing soon; so I'd say it's an option.

With regards to `dyn Read`, please have a look at the `arrow::csv::reader` code. We already support a `BufReader<R: Read>`, so I think we can implement the same without boxing the reader in the way that you've done so far. The only/primary reason why we still use `File` in `arrow::json` is that nobody's needed to use something else, or at least raised the issue.
___

One dramatic alternative would be to always require a schema, and leave inference to the user. We could then consume the buffer reader (`reader: mut BufReader<R>` instead of `reader: &mut BufReader<R>`), so that we don't leave the user with a file handle that's already partially/fully consumed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org