You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/11 20:19:29 UTC

[GitHub] [arrow-rs] nevi-me opened a new pull request #539: generate parquet schema from rust struct

nevi-me opened a new pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539


   # Which issue does this PR close?
   
   None
   
   # Rationale for this change
    
   Users of `parquet_derive` currently have to write the schema of their Rust struct by hand.
   This is inconvenient, and can be generated for them.
   
   # What changes are included in this PR?
   
   Adds `parquet::record::RecordWriter::schema()` trait member, and an implementation in `parquet_derive` to generate a schema for the user.
   
   # Are there any user-facing changes?
   
   Yes, new API. The breakage is probably irrelevant as I don't think that there are many users of `parquet_derive`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] bryantbiggs commented on pull request #539: generate parquet schema from rust struct

Posted by GitBox <gi...@apache.org>.
bryantbiggs commented on pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#issuecomment-877862396


   nice!!! 🎉 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #539: generate parquet schema from rust struct

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#issuecomment-877857352


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#539](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e13b9e7) into [master](https://codecov.io/gh/apache/arrow-rs/commit/f1fb2b11bbd6350365de010d3e1d676a27602d3a?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f1fb2b1) will **decrease** coverage by `0.14%`.
   > The diff coverage is `22.01%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/539/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #539      +/-   ##
   ==========================================
   - Coverage   82.60%   82.45%   -0.15%     
   ==========================================
     Files         167      167              
     Lines       45984    46084     +100     
   ==========================================
   + Hits        37984    37999      +15     
   - Misses       8000     8085      +85     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [parquet\_derive/src/lib.rs](https://codecov.io/gh/apache/arrow-rs/pull/539/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL2xpYi5ycw==) | `0.00% <0.00%> (ø)` | |
   | [parquet\_derive/src/parquet\_field.rs](https://codecov.io/gh/apache/arrow-rs/pull/539/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL3BhcnF1ZXRfZmllbGQucnM=) | `66.16% <12.50%> (-17.17%)` | :arrow_down: |
   | [parquet\_derive\_test/src/lib.rs](https://codecov.io/gh/apache/arrow-rs/pull/539/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmVfdGVzdC9zcmMvbGliLnJz) | `100.00% <100.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [f1fb2b1...e13b9e7](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on pull request #539: generate parquet schema from rust struct

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#issuecomment-879230190


   I plan to merge this later today (will be included in 5.0) unless anyone else wants more time to review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on a change in pull request #539: generate parquet schema from rust struct

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#discussion_r668229869



##########
File path: parquet_derive/src/lib.rs
##########
@@ -69,7 +64,7 @@ mod parquet_field;
 ///     }
 ///   ];
 ///
-///  let schema = Arc::new(parse_message_type(schema_str).unwrap());
+///  let schema = samples.as_slice().schema();

Review comment:
       maybe we could implement it for anything that implemented `IntoIterator<Item=P>` where `P: AsRef<Struct>` or whatever




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] nevi-me commented on a change in pull request #539: generate parquet schema from rust struct

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#discussion_r668227649



##########
File path: parquet_derive/src/lib.rs
##########
@@ -69,7 +64,7 @@ mod parquet_field;
 ///     }
 ///   ];
 ///
-///  let schema = Arc::new(parse_message_type(schema_str).unwrap());
+///  let schema = samples.as_slice().schema();

Review comment:
       Yeah, I didn't like it too, but `RecordWriter` is implemented on &[Struct], so this was the least disruptive change to the trait I could make




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] nevi-me commented on a change in pull request #539: generate parquet schema from rust struct

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#discussion_r668233857



##########
File path: parquet_derive/src/lib.rs
##########
@@ -69,7 +64,7 @@ mod parquet_field;
 ///     }
 ///   ];
 ///
-///  let schema = Arc::new(parse_message_type(schema_str).unwrap());
+///  let schema = samples.as_slice().schema();

Review comment:
       I'll try that out in future PRs. The code that gets generated iterates through the slice `n` times, where `n` is the number of fields. It's not a true record writer in the parquet sense, but it uses the column writer internally.
   
   I think we'll be able to improve on it when I work on nested type support.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] nevi-me commented on pull request #539: generate parquet schema from rust struct

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#issuecomment-877855410


   Might be of interest to @xrl and @bryantbiggs as you were the original authors


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on a change in pull request #539: generate parquet schema from rust struct

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#discussion_r668201979



##########
File path: parquet_derive/src/lib.rs
##########
@@ -69,7 +64,7 @@ mod parquet_field;
 ///     }
 ///   ];
 ///
-///  let schema = Arc::new(parse_message_type(schema_str).unwrap());
+///  let schema = samples.as_slice().schema();

Review comment:
       It would be cool to avoid having to use `as_slice()` but this PR seems significantly better than previously, so 👍  for me. We can make it even better in the future

##########
File path: parquet_derive/src/lib.rs
##########
@@ -121,6 +122,22 @@ pub fn parquet_record_writer(input: proc_macro::TokenStream) -> proc_macro::Toke
 
         Ok(())
       }
+
+      fn schema(&self) -> Result<parquet::schema::types::TypePtr, parquet::errors::ParquetError> {
+        use parquet::schema::types::Type as ParquetType;
+        use parquet::schema::types::TypePtr;
+        use parquet::basic::LogicalType;
+        use parquet::basic::*;
+
+        let mut fields: Vec<TypePtr> = Vec::new();
+        #(

Review comment:
       this is pretty cool 🧙  👍 

##########
File path: parquet_derive_test/src/lib.rs
##########
@@ -90,15 +102,31 @@ mod tests {
             a_borrowed_string: &a_borrowed_string,
             maybe_a_str: Some(&a_str[..]),
             maybe_a_string: Some(a_str.clone()),
-            magic_number: 100,
-            low_quality_pi: 3.14,
-            high_quality_pi: 3.1415,
-            maybe_pi: Some(3.14),
-            maybe_best_pi: Some(3.1415),
+            i16: -45,
+            i32: 456,
+            u64: 4563424,
+            maybe_u8: None,
+            maybe_i16: Some(3),
+            maybe_u32: None,
+            maybe_usize: Some(4456),
+            isize: -365,
+            float: 3.5,
+            double: std::f64::NAN,
+            maybe_float: None,
+            maybe_double: Some(std::f64::MAX),
             borrowed_maybe_a_string: &maybe_a_string,
             borrowed_maybe_a_str: &maybe_a_str,
         }];
 
+        let schema = Arc::new(parse_message_type(schema_str).unwrap());
+        let generated_schema = drs.as_slice().schema().unwrap();
+
+        assert_eq!(&schema, &generated_schema);

Review comment:
       ❤️ 

##########
File path: parquet_derive/src/parquet_field.rs
##########
@@ -174,6 +174,50 @@ impl Field {
         }
     }
 
+    pub fn parquet_type(&self) -> proc_macro2::TokenStream {
+        // TODO: Support group types

Review comment:
       maybe we should add notes about these TODO's in the docstrings (aka "group types are not yet supported") ?

##########
File path: parquet_derive_test/src/lib.rs
##########
@@ -57,27 +64,32 @@ mod tests {
     #[test]
     fn test_parquet_derive_hello() {
         let file = get_temp_file("test_parquet_derive_hello", &[]);
-        let schema_str = "message schema {
+
+        // The schema is not required, but this tests that the generated

Review comment:
       👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter commented on pull request #539: generate parquet schema from rust struct

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#issuecomment-877857352


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#539](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (1984adf) into [master](https://codecov.io/gh/apache/arrow-rs/commit/f1fb2b11bbd6350365de010d3e1d676a27602d3a?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f1fb2b1) will **decrease** coverage by `0.14%`.
   > The diff coverage is `6.97%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/539/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #539      +/-   ##
   ==========================================
   - Coverage   82.60%   82.45%   -0.15%     
   ==========================================
     Files         167      167              
     Lines       45984    46066      +82     
   ==========================================
   + Hits        37984    37986       +2     
   - Misses       8000     8080      +80     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [parquet\_derive/src/lib.rs](https://codecov.io/gh/apache/arrow-rs/pull/539/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL2xpYi5ycw==) | `0.00% <0.00%> (ø)` | |
   | [parquet\_derive/src/parquet\_field.rs](https://codecov.io/gh/apache/arrow-rs/pull/539/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL3BhcnF1ZXRfZmllbGQucnM=) | `66.23% <0.00%> (-17.10%)` | :arrow_down: |
   | [parquet\_derive\_test/src/lib.rs](https://codecov.io/gh/apache/arrow-rs/pull/539/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmVfdGVzdC9zcmMvbGliLnJz) | `100.00% <100.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [f1fb2b1...1984adf](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] xrl commented on a change in pull request #539: generate parquet schema from rust struct

Posted by GitBox <gi...@apache.org>.
xrl commented on a change in pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#discussion_r668239192



##########
File path: parquet_derive/src/lib.rs
##########
@@ -69,7 +64,7 @@ mod parquet_field;
 ///     }
 ///   ];
 ///
-///  let schema = Arc::new(parse_message_type(schema_str).unwrap());
+///  let schema = samples.as_slice().schema();

Review comment:
       I appreciate the extra attention to this code. My team relies on the derive macro to translate simple structs to parquet, which we do very often in our data pipeline. I'd be happy to test out future, more comprehensive versions. I applaud you effort, I was too intimidated by the record writer to keep going!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb merged pull request #539: generate parquet schema from rust struct

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org