You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/11 20:19:29 UTC
[GitHub] [arrow-rs] nevi-me opened a new pull request #539: generate parquet schema from rust struct
nevi-me opened a new pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539
# Which issue does this PR close?
None
# Rationale for this change
Users of `parquet_derive` currently have to write the schema of their Rust struct by hand.
This is inconvenient, and can be generated for them.
# What changes are included in this PR?
Adds `parquet::record::RecordWriter::schema()` trait member, and an implementation in `parquet_derive` to generate a schema for the user.
# Are there any user-facing changes?
Yes, new API. The breakage is probably irrelevant as I don't think that there are many users of `parquet_derive`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] bryantbiggs commented on pull request #539: generate parquet schema from rust struct
Posted by GitBox <gi...@apache.org>.
bryantbiggs commented on pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#issuecomment-877862396
nice!!! 🎉
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #539: generate parquet schema from rust struct
Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#issuecomment-877857352
# [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#539](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e13b9e7) into [master](https://codecov.io/gh/apache/arrow-rs/commit/f1fb2b11bbd6350365de010d3e1d676a27602d3a?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f1fb2b1) will **decrease** coverage by `0.14%`.
> The diff coverage is `22.01%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/539/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #539 +/- ##
==========================================
- Coverage 82.60% 82.45% -0.15%
==========================================
Files 167 167
Lines 45984 46084 +100
==========================================
+ Hits 37984 37999 +15
- Misses 8000 8085 +85
```
| [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [parquet\_derive/src/lib.rs](https://codecov.io/gh/apache/arrow-rs/pull/539/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL2xpYi5ycw==) | `0.00% <0.00%> (ø)` | |
| [parquet\_derive/src/parquet\_field.rs](https://codecov.io/gh/apache/arrow-rs/pull/539/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL3BhcnF1ZXRfZmllbGQucnM=) | `66.16% <12.50%> (-17.17%)` | :arrow_down: |
| [parquet\_derive\_test/src/lib.rs](https://codecov.io/gh/apache/arrow-rs/pull/539/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmVfdGVzdC9zcmMvbGliLnJz) | `100.00% <100.00%> (ø)` | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [f1fb2b1...e13b9e7](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb commented on pull request #539: generate parquet schema from rust struct
Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#issuecomment-879230190
I plan to merge this later today (will be included in 5.0) unless anyone else wants more time to review
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb commented on a change in pull request #539: generate parquet schema from rust struct
Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#discussion_r668229869
##########
File path: parquet_derive/src/lib.rs
##########
@@ -69,7 +64,7 @@ mod parquet_field;
/// }
/// ];
///
-/// let schema = Arc::new(parse_message_type(schema_str).unwrap());
+/// let schema = samples.as_slice().schema();
Review comment:
maybe we could implement it for anything that implemented `IntoIterator<Item=P>` where `P: AsRef<Struct>` or whatever
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] nevi-me commented on a change in pull request #539: generate parquet schema from rust struct
Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#discussion_r668227649
##########
File path: parquet_derive/src/lib.rs
##########
@@ -69,7 +64,7 @@ mod parquet_field;
/// }
/// ];
///
-/// let schema = Arc::new(parse_message_type(schema_str).unwrap());
+/// let schema = samples.as_slice().schema();
Review comment:
Yeah, I didn't like it too, but `RecordWriter` is implemented on &[Struct], so this was the least disruptive change to the trait I could make
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] nevi-me commented on a change in pull request #539: generate parquet schema from rust struct
Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#discussion_r668233857
##########
File path: parquet_derive/src/lib.rs
##########
@@ -69,7 +64,7 @@ mod parquet_field;
/// }
/// ];
///
-/// let schema = Arc::new(parse_message_type(schema_str).unwrap());
+/// let schema = samples.as_slice().schema();
Review comment:
I'll try that out in future PRs. The code that gets generated iterates through the slice `n` times, where `n` is the number of fields. It's not a true record writer in the parquet sense, but it uses the column writer internally.
I think we'll be able to improve on it when I work on nested type support.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] nevi-me commented on pull request #539: generate parquet schema from rust struct
Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#issuecomment-877855410
Might be of interest to @xrl and @bryantbiggs as you were the original authors
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb commented on a change in pull request #539: generate parquet schema from rust struct
Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#discussion_r668201979
##########
File path: parquet_derive/src/lib.rs
##########
@@ -69,7 +64,7 @@ mod parquet_field;
/// }
/// ];
///
-/// let schema = Arc::new(parse_message_type(schema_str).unwrap());
+/// let schema = samples.as_slice().schema();
Review comment:
It would be cool to avoid having to use `as_slice()` but this PR seems significantly better than previously, so 👍 for me. We can make it even better in the future
##########
File path: parquet_derive/src/lib.rs
##########
@@ -121,6 +122,22 @@ pub fn parquet_record_writer(input: proc_macro::TokenStream) -> proc_macro::Toke
Ok(())
}
+
+ fn schema(&self) -> Result<parquet::schema::types::TypePtr, parquet::errors::ParquetError> {
+ use parquet::schema::types::Type as ParquetType;
+ use parquet::schema::types::TypePtr;
+ use parquet::basic::LogicalType;
+ use parquet::basic::*;
+
+ let mut fields: Vec<TypePtr> = Vec::new();
+ #(
Review comment:
this is pretty cool 🧙 👍
##########
File path: parquet_derive_test/src/lib.rs
##########
@@ -90,15 +102,31 @@ mod tests {
a_borrowed_string: &a_borrowed_string,
maybe_a_str: Some(&a_str[..]),
maybe_a_string: Some(a_str.clone()),
- magic_number: 100,
- low_quality_pi: 3.14,
- high_quality_pi: 3.1415,
- maybe_pi: Some(3.14),
- maybe_best_pi: Some(3.1415),
+ i16: -45,
+ i32: 456,
+ u64: 4563424,
+ maybe_u8: None,
+ maybe_i16: Some(3),
+ maybe_u32: None,
+ maybe_usize: Some(4456),
+ isize: -365,
+ float: 3.5,
+ double: std::f64::NAN,
+ maybe_float: None,
+ maybe_double: Some(std::f64::MAX),
borrowed_maybe_a_string: &maybe_a_string,
borrowed_maybe_a_str: &maybe_a_str,
}];
+ let schema = Arc::new(parse_message_type(schema_str).unwrap());
+ let generated_schema = drs.as_slice().schema().unwrap();
+
+ assert_eq!(&schema, &generated_schema);
Review comment:
❤️
##########
File path: parquet_derive/src/parquet_field.rs
##########
@@ -174,6 +174,50 @@ impl Field {
}
}
+ pub fn parquet_type(&self) -> proc_macro2::TokenStream {
+ // TODO: Support group types
Review comment:
maybe we should add notes about these TODO's in the docstrings (aka "group types are not yet supported") ?
##########
File path: parquet_derive_test/src/lib.rs
##########
@@ -57,27 +64,32 @@ mod tests {
#[test]
fn test_parquet_derive_hello() {
let file = get_temp_file("test_parquet_derive_hello", &[]);
- let schema_str = "message schema {
+
+ // The schema is not required, but this tests that the generated
Review comment:
👍
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] codecov-commenter commented on pull request #539: generate parquet schema from rust struct
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#issuecomment-877857352
# [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#539](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (1984adf) into [master](https://codecov.io/gh/apache/arrow-rs/commit/f1fb2b11bbd6350365de010d3e1d676a27602d3a?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (f1fb2b1) will **decrease** coverage by `0.14%`.
> The diff coverage is `6.97%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/539/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #539 +/- ##
==========================================
- Coverage 82.60% 82.45% -0.15%
==========================================
Files 167 167
Lines 45984 46066 +82
==========================================
+ Hits 37984 37986 +2
- Misses 8000 8080 +80
```
| [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [parquet\_derive/src/lib.rs](https://codecov.io/gh/apache/arrow-rs/pull/539/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL2xpYi5ycw==) | `0.00% <0.00%> (ø)` | |
| [parquet\_derive/src/parquet\_field.rs](https://codecov.io/gh/apache/arrow-rs/pull/539/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL3BhcnF1ZXRfZmllbGQucnM=) | `66.23% <0.00%> (-17.10%)` | :arrow_down: |
| [parquet\_derive\_test/src/lib.rs](https://codecov.io/gh/apache/arrow-rs/pull/539/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmVfdGVzdC9zcmMvbGliLnJz) | `100.00% <100.00%> (ø)` | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [f1fb2b1...1984adf](https://codecov.io/gh/apache/arrow-rs/pull/539?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] xrl commented on a change in pull request #539: generate parquet schema from rust struct
Posted by GitBox <gi...@apache.org>.
xrl commented on a change in pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539#discussion_r668239192
##########
File path: parquet_derive/src/lib.rs
##########
@@ -69,7 +64,7 @@ mod parquet_field;
/// }
/// ];
///
-/// let schema = Arc::new(parse_message_type(schema_str).unwrap());
+/// let schema = samples.as_slice().schema();
Review comment:
I appreciate the extra attention to this code. My team relies on the derive macro to translate simple structs to parquet, which we do very often in our data pipeline. I'd be happy to test out future, more comprehensive versions. I applaud you effort, I was too intimidated by the record writer to keep going!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb merged pull request #539: generate parquet schema from rust struct
Posted by GitBox <gi...@apache.org>.
alamb merged pull request #539:
URL: https://github.com/apache/arrow-rs/pull/539
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org