You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/11 21:36:16 UTC

[GitHub] [arrow-rs] alamb opened a new pull request, #1539: Consolidate JSON reader options

alamb opened a new pull request, #1539:
URL: https://github.com/apache/arrow-rs/pull/1539

   # Which issue does this PR close?
   
   Draft as it builds on https://github.com/apache/arrow-rs/pull/1451 from @sum12 .
   
   Closes https://github.com/apache/arrow-rs/issues/1538
   
   # Rationale for this change
   @sum12 started refactoring options into a common struct in https://github.com/apache/arrow-rs/pull/1451 and I wanted to finish that work to minimize API churn when we release the next arrow
   
   
   # What changes are included in this PR?
   1. Move `batch_size` to DecoderOptions as well.
   2. Consolidate field handling in `ReaderBuilder`
   
   # Are there any user-facing changes?
   Yes -- if they use the low level Json decoder, the options are now encoded in a struct rather than as parameters


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on pull request #1539: Consolidate JSON reader options

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #1539:
URL: https://github.com/apache/arrow-rs/pull/1539#issuecomment-1096744594

   I would like to merge this prior to creating the arrow 12 release candidate later this week (so that we minimize the API churn) -- @nevi-me  and @houqp do you perhaps have a moment to review this?
   
   cc @sum12 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #1539: Consolidate JSON reader options

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #1539:
URL: https://github.com/apache/arrow-rs/pull/1539#discussion_r847756234


##########
arrow/src/json/reader.rs:
##########
@@ -1606,26 +1607,8 @@ pub struct ReaderBuilder {
     ///
     /// If a number is not provided, all the records are read.
     max_records: Option<usize>,
-    /// Batch size (number of records to load each time)
-    ///
-    /// The default batch size when using the `ReaderBuilder` is 1024 records
-    batch_size: usize,
-    /// Optional projection for which columns to load (zero-based column indices)
-    projection: Option<Vec<String>>,
-    /// optional HashMap of column names to format strings
-    format_strings: Option<HashMap<String, String>>,
-}
-
-impl Default for ReaderBuilder {
-    fn default() -> Self {
-        Self {
-            schema: None,
-            max_records: None,
-            batch_size: 1024,

Review Comment:
   this default is moved into `DecoderOptions`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on pull request #1539: Consolidate JSON reader options

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #1539:
URL: https://github.com/apache/arrow-rs/pull/1539#issuecomment-1098024536

   > It took me a while to work out why there is a separation between the `Decoder` and a `Reader` and therefore we end up with nested builders, but I think this makes sense. We want to allow people to construct a `Decoder` with arbitrary streams of `serde_json::Value` and convert them to `RecordBatch`, whereas `Reader` only supports files 👍
   
   I tried to encode this insight into the docs here: https://github.com/apache/arrow-rs/pull/1559


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb merged pull request #1539: Consolidate JSON reader options

Posted by GitBox <gi...@apache.org>.
alamb merged PR #1539:
URL: https://github.com/apache/arrow-rs/pull/1539


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org