You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/22 07:19:54 UTC

[GitHub] [arrow-datafusion] yuribudilov opened a new issue #769: arrow::util::pretty::pretty_format_batches missing

yuribudilov opened a new issue #769:
URL: https://github.com/apache/arrow-datafusion/issues/769


   Hello
   My apologies for novice Arrow question. 
   I am not able to compile the code sample due to missing "pretty" function in arrow util.
   Using Rust 1.53.0 Stable.
   Toml is:
   [package]
   name = "test_arrow"
   version = "0.1.0"
   edition = "2018"
   [dependencies]
   arrow = "5.0.0"
   datafusion = "4.0.0"
   tokio = "1.8.2"
   
   // compilation can not find this:
   use arrow::util::pretty::print_batches;
   // also this fails to compile:
   let pretty_results = arrow::util::pretty::pretty_format_batches(&results)?;
   
   Error: cannot find 'pretty' in util.
   
   What am I doing wrong please?
   
   thank you very much
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb closed issue #769: arrow::util::pretty::pretty_format_batches missing

Posted by GitBox <gi...@apache.org>.
alamb closed issue #769:
URL: https://github.com/apache/arrow-datafusion/issues/769


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #769: arrow::util::pretty::pretty_format_batches missing

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #769:
URL: https://github.com/apache/arrow-datafusion/issues/769#issuecomment-886036316


   > All of those issues should, in theory, disappear when Rust/Arrow/Datafusion/Ballista is running the Spark show
   
   Indeed! I think this is @andygrove 's vision as well. 
   
   Thanks for the kind words.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yuribudilov commented on issue #769: arrow::util::pretty::pretty_format_batches missing

Posted by GitBox <gi...@apache.org>.
yuribudilov commented on issue #769:
URL: https://github.com/apache/arrow-datafusion/issues/769#issuecomment-885967353


   I appreciate your support, wonderful and quick! 
   FWIW - I have used Apache Spark heavily for a couple of years and I am of the opinion that Rust implementation of the great "Spark concept" should be the new ideal for the future. Most of the Spark issues I faced were related to JVM, OO memory overheads, vast memory bloat, many job crashes due to memory exhaustion and GC related issues. The performance often was far from great too. All of those issues should, in theory, disappear when Rust/Arrow/Datafusion/Ballista is running the Spark show. Bring it on. Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yuribudilov commented on issue #769: arrow::util::pretty::pretty_format_batches missing

Posted by GitBox <gi...@apache.org>.
yuribudilov commented on issue #769:
URL: https://github.com/apache/arrow-datafusion/issues/769#issuecomment-885302086


   OK, I fixed it, thanks to Rust compiler (what a fantastic language!!)
   
   Rust errors "suggested" different version of arrow crate were used.
   
   So I tried using an earlier arrow version in TOML:
   
   arrow = { version = "4.4.0", features = ["prettyprint", "default"] }
   
   This compiles and builds and runs correctly !! Phew! Happy days.
   
   May I humbly suggest there is likely to be a buglet in either datafusion 4.0.0 or in arrow 5.0 or both ?
   
   May I also suggest to update datafusion documentation to list more complete TOML dependencies because those of us who are new to arrow/datafusion but would like to learn could use more help and reliable and accessible documentation is all we have.
   
   Many thanks for reading thus far, it looks like a fantastic product you have been building! 
   Please feel free to close this issue.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #769: arrow::util::pretty::pretty_format_batches missing

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #769:
URL: https://github.com/apache/arrow-datafusion/issues/769#issuecomment-885650360


   ```
   arrow = { version = "4.4.0", features = ["prettyprint", "default"] }
   ```
   
   Yes, this is the version of arrow that the (released) datafusion version 4.0 works with. 👍 
   
   The fact that we haven't released a new version of datafusion to crates.io that works with arrow 5 is a problem which we should rectify. 
   
   DataFusion (at least on master) also includes a "public export" of its arrow dependency, so perhaps we should change the example from
   
   ```rust
   use arrow::record_batch::RecordBatch;
   use arrow::util::pretty::print_batches;
   ```
   
   to 
   
   ```rust
   use datafusion::arrow::record_batch::RecordBatch;
   use datafusion::arrow::util::pretty::print_batches;
   ```
   
   > Many thanks for reading thus far, it looks like a fantastic product you have been building!
   
   Thanks! Kudos go to the whole team (there are many people whose work goes into making it)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb edited a comment on issue #769: arrow::util::pretty::pretty_format_batches missing

Posted by GitBox <gi...@apache.org>.
alamb edited a comment on issue #769:
URL: https://github.com/apache/arrow-datafusion/issues/769#issuecomment-884985245


   Hi @yuribudilov  -- you need to enable the "prettyprint" feature for arrow.
   
   So instead of 
   ```toml
   arrow = "5.0.0"
   ```
   
   try using
   ```toml
   arrow = { version = "5.0", features = ["prettyprint"] }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #769: arrow::util::pretty::pretty_format_batches missing

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #769:
URL: https://github.com/apache/arrow-datafusion/issues/769#issuecomment-885887095


   I made https://github.com/apache/arrow-datafusion/pull/772 to try and improve the docs a little bit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yuribudilov commented on issue #769: arrow::util::pretty::pretty_format_batches missing

Posted by GitBox <gi...@apache.org>.
yuribudilov commented on issue #769:
URL: https://github.com/apache/arrow-datafusion/issues/769#issuecomment-885293375


   thank you. 
   
   One compilation error is now gone but replaced by another 2 compilation errors, one step forward, two steps back.
   
   Repro:
   
   on https://github.com/apache/arrow-datafusion there is Rust code sample given (quote), which does not compile:
   
   use arrow::record_batch::RecordBatch;
   use arrow::util::pretty::print_batches;
   use datafusion::prelude::*;
   
   #[tokio::main]
   async fn main() -> datafusion::error::Result<()> {
       // register the table
       let mut ctx = ExecutionContext::new();
       ctx.register_csv("example", "tests/example.csv", CsvReadOptions::new())?;
   
       // create a plan to run a SQL query
       let df = ctx.sql("SELECT a, MIN(b) FROM example GROUP BY a LIMIT 100")?;
   
       // execute and print results
       let results: Vec<RecordBatch> = df.collect().await?; // error 1 here
       print_batches(&results)?; // error 2 here
       Ok(())
   }
   
   The TOML on the link only shows one line: datafusion = "4.0.0-SNAPSHOT"
   
   This TOML does not work because there is no arrow and no tokio dependency in TOML.
   So I added those myself.
   
   Here is what I have now, which still does not work:
   [package]
   name = "test_arrow"
   version = "0.1.0"
   edition = "2018"
   [dependencies]
   # arrow = "5.0.0"
   datafusion = "4.0.0"
   tokio = "1.8.2"
   arrow = { version = "5.0", features = ["prettyprint"] }
   
   
   I still have 2 compilation errors based on above:
   
   15 |     let results: Vec<RecordBatch> = df.collect().await?; // error 1
      |                                     ^^^^^^^^^^^^^^^^^^^ expected struct `arrow::record_batch::RecordBatch`, found a different struct `arrow::record_batch::RecordBatch`
      |
      = note: expected struct `Vec<arrow::record_batch::RecordBatch>` (struct `arrow::record_batch::RecordBatch`)
                 found struct `Vec<arrow::record_batch::RecordBatch>` (struct `arrow::record_batch::RecordBatch`)
      = note: perhaps two different versions of crate `arrow` are being used?
   note: return type inferred to be `Vec<arrow::record_batch::RecordBatch>` here
     --> src\main.rs:9:5
      |
   9  |     ctx.register_csv("example", "tests/example.csv", CsvReadOptions::new())?;
      |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   
   error[E0277]: `?` couldn't convert the error to `DataFusionError`
     --> src\main.rs:16:28
      |
   16 |     print_batches(&results)?; // error 2
      |                            ^ the trait `From<arrow::error::ArrowError>` is not implemented for `DataFusionError`
      |
      = note: the question mark operation (`?`) implicitly performs a conversion on the error value using the `From` trait
      = help: the following implementations were found:
                <DataFusionError as From<arrow::error::ArrowError>>
                <DataFusionError as From<parquet::errors::ParquetError>>
                <DataFusionError as From<sqlparser::parser::ParserError>>
                <DataFusionError as From<std::io::Error>>
      = note: required by `from`
   
   error: aborting due to 2 previous errors
   
   First one can be "covered up" by letting Rust infer data type like so (which is very odd given it infers the same Vec<RecordBatch> !
   let results = df.collect().await?;
   
   The second error indicated something is wrong with TOML documentation:
   print_batches(&results)?;
   
   Can you please point me to documentation how to use this product from Rust? 
   Many thanks.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #769: arrow::util::pretty::pretty_format_batches missing

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #769:
URL: https://github.com/apache/arrow-datafusion/issues/769#issuecomment-884985245


   Hi @yuribudilov  -- you need to enable the "pretty" feature for arrow.
   
   So instead of 
   ```toml
   arrow = "5.0.0"
   ```
   
   try using
   ```toml
   arrow = { version = "5.0", features = ["prettyprint"] }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org