You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/16 12:46:17 UTC

[GitHub] [arrow-rs] sum12 opened a new pull request #1451: Allow json reader/decoder to work with format_strings for each field and use Parser trait correspondingly

sum12 opened a new pull request #1451:
URL: https://github.com/apache/arrow-rs/pull/1451


   # Which issue does this PR close?
   add implementation for remaining basic types as needed by the json::decoder
   
   
   # Rationale for this change
   This approach is seem cleaner than the on e initially proposed in #1301 
   
   # What changes are included in this PR?
   Non-Formatted parsing for timestamp types is included, but formatted parsing is still missing and can be done as a followup PR.
   When `serde` thinks that the `value` is a `string` then these new parsers will kick in and parse the string using provided format  or some predefined set of format strings (for timestamp types)
   for time32/time64 based types no format is specified
   
   # Are there any user-facing changes?
   A new `format_strings` parameter is added to the reader and decoder
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] sum12 commented on a change in pull request #1451: Allow json reader/decoder to work with format_strings for each field and use Parser trait correspondingly

Posted by GitBox <gi...@apache.org>.
sum12 commented on a change in pull request #1451:
URL: https://github.com/apache/arrow-rs/pull/1451#discussion_r838163495



##########
File path: arrow/src/json/reader.rs
##########
@@ -589,11 +592,13 @@ impl Decoder {
         schema: SchemaRef,
         batch_size: usize,
         projection: Option<Vec<String>>,
+        format_strings: Option<HashMap<String, String>>,

Review comment:
       I often wondered too. Wouldn't it allow the parser to diverge from schema ? We would need a runtime validation of what schema says and formaters actually return.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on a change in pull request #1451: Allow json reader/decoder to work with format_strings for each field and use Parser trait correspondingly

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1451:
URL: https://github.com/apache/arrow-rs/pull/1451#discussion_r837825084



##########
File path: arrow/src/json/reader.rs
##########
@@ -580,6 +581,8 @@ pub struct Decoder {
     projection: Option<Vec<String>>,
     /// Batch size (number of records to load each time)
     batch_size: usize,
+    /// optional HashMap of column names to its format strings

Review comment:
       ```suggestion
       /// optional HashMap of column names to its format string
   ```

##########
File path: arrow/src/json/reader.rs
##########
@@ -589,11 +592,13 @@ impl Decoder {
         schema: SchemaRef,
         batch_size: usize,
         projection: Option<Vec<String>>,
+        format_strings: Option<HashMap<String, String>>,

Review comment:
       I can't help but wonder if we would be better off allowing users to provide their own parsers to support the most general case. For example, what if this was
   
   ```suggestion
           formaters: Option<HashMap<String, Arc<dyn Parser>>,
   ```
   
   And then the user could pass whatever custom parser they wanted to the JSON reader?
   
   

##########
File path: arrow/src/json/reader.rs
##########
@@ -38,7 +38,7 @@
 //!
 //! let file = File::open("test/data/basic.json").unwrap();
 //!
-//! let mut json = json::Reader::new(BufReader::new(file), Arc::new(schema), 1024, None);
+//! let mut json = json::Reader::new(BufReader::new(file), Arc::new(schema), 1024, None, None);

Review comment:
       🤔  it is probably time to make some sort of `JsonReaderOptions` struct so we can add new options without changing the public API




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] sum12 commented on a change in pull request #1451: Allow json reader/decoder to work with format_strings for each field and use Parser trait correspondingly

Posted by GitBox <gi...@apache.org>.
sum12 commented on a change in pull request #1451:
URL: https://github.com/apache/arrow-rs/pull/1451#discussion_r841073393



##########
File path: arrow/src/json/reader.rs
##########
@@ -38,7 +38,7 @@
 //!
 //! let file = File::open("test/data/basic.json").unwrap();
 //!
-//! let mut json = json::Reader::new(BufReader::new(file), Arc::new(schema), 1024, None);
+//! let mut json = json::Reader::new(BufReader::new(file), Arc::new(schema), 1024, None, None);

Review comment:
       added a `DecoderOptions` as decoder is like to get those new options in the future than the reader. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter commented on pull request #1451: Allow json reader/decoder to work with format_strings for each field and use Parser trait correspondingly

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #1451:
URL: https://github.com/apache/arrow-rs/pull/1451#issuecomment-1086632208


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1451?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1451](https://codecov.io/gh/apache/arrow-rs/pull/1451?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (755e4ad) into [master](https://codecov.io/gh/apache/arrow-rs/commit/717216f2a296566b10d786a113ceea937e95a459?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (717216f) will **increase** coverage by `0.08%`.
   > The diff coverage is `79.61%`.
   
   > :exclamation: Current head 755e4ad differs from pull request most recent head fff5ccc. Consider uploading reports for the commit fff5ccc to get more accurate results
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #1451      +/-   ##
   ==========================================
   + Coverage   82.67%   82.76%   +0.08%     
   ==========================================
     Files         185      190       +5     
     Lines       53866    54737     +871     
   ==========================================
   + Hits        44535    45304     +769     
   - Misses       9331     9433     +102     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/1451?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/array/array\_list.rs](https://codecov.io/gh/apache/arrow-rs/pull/1451/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2FycmF5X2xpc3QucnM=) | `95.52% <ø> (ø)` | |
   | [arrow/src/array/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/1451/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L21vZC5ycw==) | `100.00% <ø> (ø)` | |
   | [...ion-testing/src/bin/arrow-json-integration-test.rs](https://codecov.io/gh/apache/arrow-rs/pull/1451/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aW50ZWdyYXRpb24tdGVzdGluZy9zcmMvYmluL2Fycm93LWpzb24taW50ZWdyYXRpb24tdGVzdC5ycw==) | `0.00% <0.00%> (ø)` | |
   | [integration-testing/src/lib.rs](https://codecov.io/gh/apache/arrow-rs/pull/1451/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aW50ZWdyYXRpb24tdGVzdGluZy9zcmMvbGliLnJz) | `0.00% <0.00%> (ø)` | |
   | [parquet/src/arrow/levels.rs](https://codecov.io/gh/apache/arrow-rs/pull/1451/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvbGV2ZWxzLnJz) | `84.80% <25.00%> (+0.12%)` | :arrow_up: |
   | [arrow/src/util/reader\_parser.rs](https://codecov.io/gh/apache/arrow-rs/pull/1451/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL3V0aWwvcmVhZGVyX3BhcnNlci5ycw==) | `67.39% <33.33%> (+2.17%)` | :arrow_up: |
   | [arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow-rs/pull/1451/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2J1aWxkZXIucnM=) | `86.68% <50.00%> (-0.05%)` | :arrow_down: |
   | [arrow/src/array/equal/list.rs](https://codecov.io/gh/apache/arrow-rs/pull/1451/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2VxdWFsL2xpc3QucnM=) | `79.38% <59.57%> (-9.85%)` | :arrow_down: |
   | [parquet/src/arrow/array\_reader/builder.rs](https://codecov.io/gh/apache/arrow-rs/pull/1451/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvYXJyYXlfcmVhZGVyL2J1aWxkZXIucnM=) | `66.30% <66.30%> (ø)` | |
   | [arrow/src/array/transform/fixed\_size\_list.rs](https://codecov.io/gh/apache/arrow-rs/pull/1451/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9maXhlZF9zaXplX2xpc3QucnM=) | `73.68% <73.68%> (ø)` | |
   | ... and [34 more](https://codecov.io/gh/apache/arrow-rs/pull/1451/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1451?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1451?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [717216f...fff5ccc](https://codecov.io/gh/apache/arrow-rs/pull/1451?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on pull request #1451: Allow json reader/decoder to work with format_strings for each field and use Parser trait correspondingly

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1451:
URL: https://github.com/apache/arrow-rs/pull/1451#issuecomment-1075722835


   Will try and check this out tomorrow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org