You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/11 22:15:23 UTC

[GitHub] [arrow-rs] ScottSyms opened a new issue #1302: module 'data_type' is private in Rust Parquet 8.0.0

ScottSyms opened a new issue #1302:
URL: https://github.com/apache/arrow-rs/issues/1302


   **Describe the bug**
   A clear and concise description of what the bug is.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   
   **Expected behavior**
   A clear and concise description of what you expected to happen.
   
   **Additional context**
   Add any other context about the problem here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ScottSyms closed issue #1302: module 'data_type' is private in Rust Parquet 8.0.0

Posted by GitBox <gi...@apache.org>.
ScottSyms closed issue #1302:
URL: https://github.com/apache/arrow-rs/issues/1302


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ScottSyms edited a comment on issue #1302: module 'data_type' is private in Rust Parquet 8.0.0

Posted by GitBox <gi...@apache.org>.
ScottSyms edited a comment on issue #1302:
URL: https://github.com/apache/arrow-rs/issues/1302#issuecomment-1036692748


   Executing some code from stack overflow which works with the Rust parquet crate versions earlier than 8.0.0.  The error generated by the **use parquet::data_type::ByteArray;** import suggests the data_type module is not made public.  Not sure if this a release change or a bug.
   
   ```
   use std::{fs, path::Path, sync::Arc};
   use parquet::{column::writer::ColumnWriter, data_type::ByteArray, file::{
       properties::WriterProperties,
       writer::{FileWriter, SerializedFileWriter},
   }, schema::parser::parse_message_type};
   
   fn main() {
       let path = Path::new("./sample.parquet");
   
       let message_type = "
           message schema {
               REQUIRED INT32 b;
               REQUIRED BINARY msg (UTF8);
           }
       ";
       let schema = Arc::new(parse_message_type(message_type).unwrap());
       let props = Arc::new(WriterProperties::builder().build());
       let file = fs::File::create(&path).unwrap();
   
       let mut rows: i64 = 0;
       let data = vec![
           (10, "A"),
           (20, "B"),
           (30, "C"),
           (40, "D"),
       ];
   
       let mut writer = SerializedFileWriter::new(file, schema, props).unwrap();
       for (key, value) in data {
           let mut row_group_writer = writer.next_row_group().unwrap();
           let id_writer = row_group_writer.next_column().unwrap();
           if let Some(mut writer) = id_writer {
               match writer {
                   ColumnWriter::Int32ColumnWriter(ref mut typed) => {
                       let values = vec![key];
                       rows +=
                           typed.write_batch(&values[..], None, None).unwrap() as i64;
                   },
                   _ => {
                       unimplemented!();
                   }
               }
               row_group_writer.close_column(writer).unwrap();
           }
           let data_writer = row_group_writer.next_column().unwrap();
           if let Some(mut writer) = data_writer {
               match writer {
                   ColumnWriter::ByteArrayColumnWriter(ref mut typed) => {
                       let values = ByteArray::from(value);
                       rows += typed.write_batch(&[values], None, None).unwrap() as i64;
                   }
                   _ => {
                       unimplemented!();
                   }
               }
               row_group_writer.close_column(writer).unwrap();
           }
           writer.close_row_group(row_group_writer).unwrap();
       }
       writer.close().unwrap();
   
       println!("Wrote {}", rows);
   
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on issue #1302: module 'data_type' is private in Rust Parquet 8.0.0

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1302:
URL: https://github.com/apache/arrow-rs/issues/1302#issuecomment-1037224511


   Hi @ScottSyms  -- thanks for the report and sorry for the issue you are encountering
   
   I think  this is fixed in arrow 9.0.0 (in https://github.com/apache/arrow-rs/pull/1244 from @tustvold ), which is due to be released later today or tomorrow. 
   
   There is a workaround (add the `experimental` feature) described here: https://github.com/apache/arrow-rs/issues/1032#issuecomment-1023952706 if you would like to use parquet 8.0.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on issue #1302: module 'data_type' is private in Rust Parquet 8.0.0

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1302:
URL: https://github.com/apache/arrow-rs/issues/1302#issuecomment-1037224511


   Hi @ScottSyms  -- thanks for the report and sorry for the issue you are encountering
   
   I think  this is fixed in arrow 9.0.0 (in https://github.com/apache/arrow-rs/pull/1244 from @tustvold ), which is due to be released later today or tomorrow. 
   
   There is a workaround (add the `experimental` feature) described here: https://github.com/apache/arrow-rs/issues/1032#issuecomment-1023952706 if you would like to use parquet 8.0.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ScottSyms closed issue #1302: module 'data_type' is private in Rust Parquet 8.0.0

Posted by GitBox <gi...@apache.org>.
ScottSyms closed issue #1302:
URL: https://github.com/apache/arrow-rs/issues/1302


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ScottSyms commented on issue #1302: module 'data_type' is private in Rust Parquet 8.0.0

Posted by GitBox <gi...@apache.org>.
ScottSyms commented on issue #1302:
URL: https://github.com/apache/arrow-rs/issues/1302#issuecomment-1037359489


   Awesome- thanks for the quick response!!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ScottSyms commented on issue #1302: module 'data_type' is private in Rust Parquet 8.0.0

Posted by GitBox <gi...@apache.org>.
ScottSyms commented on issue #1302:
URL: https://github.com/apache/arrow-rs/issues/1302#issuecomment-1036692748


   Executing some code from stack overflow which works with the Rust parquet crate before 8.0.0.  The error generated by the use parquet::data_type::ByteArray suggests the data_type module is not made public.  Not sure if this a release change or a bug.
   
   extern crate parquet;
   use std::{fs, path::Path, sync::Arc};
   use parquet::{column::writer::ColumnWriter, data_type::ByteArray, file::{
       properties::WriterProperties,
       writer::{FileWriter, SerializedFileWriter},
   }, schema::parser::parse_message_type};
   
   fn main() {
       let path = Path::new("./sample.parquet");
   
       let message_type = "
           message schema {
               REQUIRED INT32 b;
               REQUIRED BINARY msg (UTF8);
           }
       ";
       let schema = Arc::new(parse_message_type(message_type).unwrap());
       let props = Arc::new(WriterProperties::builder().build());
       let file = fs::File::create(&path).unwrap();
   
       let mut rows: i64 = 0;
       let data = vec![
           (10, "A"),
           (20, "B"),
           (30, "C"),
           (40, "D"),
       ];
   
       let mut writer = SerializedFileWriter::new(file, schema, props).unwrap();
       for (key, value) in data {
           let mut row_group_writer = writer.next_row_group().unwrap();
           let id_writer = row_group_writer.next_column().unwrap();
           if let Some(mut writer) = id_writer {
               match writer {
                   ColumnWriter::Int32ColumnWriter(ref mut typed) => {
                       let values = vec![key];
                       rows +=
                           typed.write_batch(&values[..], None, None).unwrap() as i64;
                   },
                   _ => {
                       unimplemented!();
                   }
               }
               row_group_writer.close_column(writer).unwrap();
           }
           let data_writer = row_group_writer.next_column().unwrap();
           if let Some(mut writer) = data_writer {
               match writer {
                   ColumnWriter::ByteArrayColumnWriter(ref mut typed) => {
                       let values = ByteArray::from(value);
                       rows += typed.write_batch(&[values], None, None).unwrap() as i64;
                   }
                   _ => {
                       unimplemented!();
                   }
               }
               row_group_writer.close_column(writer).unwrap();
           }
           writer.close_row_group(row_group_writer).unwrap();
       }
       writer.close().unwrap();
   
       println!("Wrote {}", rows);
   
   }


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ScottSyms commented on issue #1302: module 'data_type' is private in Rust Parquet 8.0.0

Posted by GitBox <gi...@apache.org>.
ScottSyms commented on issue #1302:
URL: https://github.com/apache/arrow-rs/issues/1302#issuecomment-1037359489


   Awesome- thanks for the quick response!!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org