You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/05/24 21:39:51 UTC

[GitHub] [arrow-rs] alamb opened a new pull request, #4278: Add `Debug` impls for `ArrowWriter` and `SerializedFileWriter`

alamb opened a new pull request, #4278:
URL: https://github.com/apache/arrow-rs/pull/4278

   This is an alternate, more targeted version of https://github.com/apache/arrow-rs/pull/4276
   
   # Which issue does this PR close?
   
   N/A
   
   # Rationale for this change
    
   This caught me when working on https://github.com/influxdata/influxdb_iox/pull/7855
   
   I had to manually derive a `Debug` impl for a structure because the `ArrowWriter` didn't
   
   
   # What changes are included in this PR?
   
   1. Add `Debug` impls for `ArrowWriter` and `SerializedFileWriter`
   
   
   # Are there any user-facing changes?
   More `Debug`! 
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #4278: Add `Debug` impls for `ArrowWriter` and `SerializedFileWriter`

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #4278:
URL: https://github.com/apache/arrow-rs/pull/4278#issuecomment-1562753679

   Thank you for this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4278: Add `Debug` impls for `ArrowWriter` and `SerializedFileWriter`

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4278:
URL: https://github.com/apache/arrow-rs/pull/4278#discussion_r1205389068


##########
parquet/src/file/writer.rs:
##########
@@ -147,6 +148,18 @@ pub struct SerializedFileWriter<W: Write> {
     kv_metadatas: Vec<KeyValue>,
 }
 
+impl<W: Write> Debug for SerializedFileWriter<W> {

Review Comment:
   Aah, yeah makes sense



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #4278: Add `Debug` impls for `ArrowWriter` and `SerializedFileWriter`

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold merged PR #4278:
URL: https://github.com/apache/arrow-rs/pull/4278


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #4278: Add `Debug` impls for `ArrowWriter` and `SerializedFileWriter`

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #4278:
URL: https://github.com/apache/arrow-rs/pull/4278#discussion_r1204852513


##########
parquet/src/file/writer.rs:
##########
@@ -147,6 +148,18 @@ pub struct SerializedFileWriter<W: Write> {
     kv_metadatas: Vec<KeyValue>,
 }
 
+impl<W: Write> Debug for SerializedFileWriter<W> {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        // implement Debug so this can be used with #[derive(Debug)]
+        // in client code rather than actually listing all the fields
+        f.debug_struct("SerializedFileWriter<W>")

Review Comment:
   Is it meaningful to have `W` in debug? No much info from it. Looks `ArrowWriter` doesn't have it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #4278: Add `Debug` impls for `ArrowWriter` and `SerializedFileWriter`

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #4278:
URL: https://github.com/apache/arrow-rs/pull/4278#discussion_r1205359508


##########
parquet/src/arrow/arrow_writer/mod.rs:
##########
@@ -92,6 +93,31 @@ pub struct ArrowWriter<W: Write> {
     max_row_group_size: usize,
 }
 
+impl<W: Write> Debug for ArrowWriter<W> {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        let mut buffered_batches = 0;

Review Comment:
   fixed



##########
parquet/src/file/writer.rs:
##########
@@ -147,6 +148,18 @@ pub struct SerializedFileWriter<W: Write> {
     kv_metadatas: Vec<KeyValue>,
 }
 
+impl<W: Write> Debug for SerializedFileWriter<W> {

Review Comment:
   I tried to do this with
   
   ```diff
   diff --git a/parquet/src/file/writer.rs b/parquet/src/file/writer.rs
   index 93e8319b0..3ab50fb4b 100644
   --- a/parquet/src/file/writer.rs
   +++ b/parquet/src/file/writer.rs
   @@ -48,6 +48,7 @@ use crate::schema::types::{
    /// A wrapper around a [`Write`] that keeps track of the number
    /// of bytes that have been written. The given [`Write`] is wrapped
    /// with a [`BufWriter`] to optimize writing performance.
   +#[derive(Debug)]
    pub struct TrackedWrite<W: Write> {
        inner: BufWriter<W>,
        bytes_written: usize,
   @@ -134,6 +135,7 @@ pub type OnCloseRowGroup<'a> = Box<
    /// - Once finished writing row group, close row group writer by calling `close`
    /// - Write subsequent row groups, if necessary.
    /// - After all row groups have been written, close the file writer using `close` method.
   +#[derive(Debug)]
    pub struct SerializedFileWriter<W: Write> {
        buf: TrackedWrite<W>,
        schema: TypePtr,
   @@ -148,17 +150,6 @@ pub struct SerializedFileWriter<W: Write> {
        kv_metadatas: Vec<KeyValue>,
    }
    
   -impl<W: Write> Debug for SerializedFileWriter<W> {
   -    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
   -        // implement Debug so this can be used with #[derive(Debug)]
   -        // in client code rather than actually listing all the fields
   -        f.debug_struct("SerializedFileWriter")
   -            .field("descr", &self.descr)
   -            .field("row_group_index", &self.row_group_index)
   -            .field("kv_metadatas", &self.kv_metadatas)
   -            .finish_non_exhaustive()
   -    }
   -}
    
    impl<W: Write> SerializedFileWriter<W> {
        /// Creates new file writer.
   
   ```
   
   However, the compiler is complains like this ( I think it is saying I need to restrict the bound on `SerializedFileWriter` which I would prefer not to do as it would be backwards incompatible)
   
   ```
       Checking parquet v40.0.0 (/Users/alamb/Software/arrow-rs/parquet)
   error[E0277]: `W` doesn't implement `std::fmt::Debug`
      --> parquet/src/arrow/arrow_writer/mod.rs:108:30
       |
   108 |             .field("writer", &self.writer)
       |                              ^^^^^^^^^^^^ `W` cannot be formatted using `{:?}` because it doesn't implement `std::fmt::Debug`
       |
   note: required for `file::writer::SerializedFileWriter<W>` to implement `std::fmt::Debug`
      --> parquet/src/file/writer.rs:138:10
       |
   138 | #[derive(Debug)]
       |          ^^^^^ unsatisfied trait bound introduced in this `derive` macro
       = note: required for the cast from `file::writer::SerializedFileWriter<W>` to the object type `dyn std::fmt::Debug`
       = note: this error originates in the derive macro `Debug` (in Nightly builds, run with -Z macro-backtrace for more info)
   help: consider further restricting this bound
       |
   96  | impl<W: Write + std::fmt::Debug> Debug for ArrowWriter<W> {
       |               +++++++++++++++++
   
   ```
   
   



##########
parquet/src/file/writer.rs:
##########
@@ -147,6 +148,18 @@ pub struct SerializedFileWriter<W: Write> {
     kv_metadatas: Vec<KeyValue>,
 }
 
+impl<W: Write> Debug for SerializedFileWriter<W> {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        // implement Debug so this can be used with #[derive(Debug)]
+        // in client code rather than actually listing all the fields
+        f.debug_struct("SerializedFileWriter<W>")

Review Comment:
   removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4278: Add `Debug` impls for `ArrowWriter` and `SerializedFileWriter`

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4278:
URL: https://github.com/apache/arrow-rs/pull/4278#discussion_r1204794460


##########
parquet/src/file/writer.rs:
##########
@@ -147,6 +148,18 @@ pub struct SerializedFileWriter<W: Write> {
     kv_metadatas: Vec<KeyValue>,
 }
 
+impl<W: Write> Debug for SerializedFileWriter<W> {

Review Comment:
   FWIW `#[derive(Debug)]` on `TrackedWrite` would allow similar here, without needing a manual impl. SerializedFileWriter doesn't contain any non-public types on it



##########
parquet/src/arrow/arrow_writer/mod.rs:
##########
@@ -92,6 +93,31 @@ pub struct ArrowWriter<W: Write> {
     max_row_group_size: usize,
 }
 
+impl<W: Write> Debug for ArrowWriter<W> {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        let mut buffered_batches = 0;

Review Comment:
   I think buffered_batches is just `self.buffer.len()`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org