You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/22 07:19:54 UTC
[GitHub] [arrow-datafusion] yuribudilov opened a new issue #769: arrow::util::pretty::pretty_format_batches missing
yuribudilov opened a new issue #769:
URL: https://github.com/apache/arrow-datafusion/issues/769
Hello
My apologies for novice Arrow question.
I am not able to compile the code sample due to missing "pretty" function in arrow util.
Using Rust 1.53.0 Stable.
Toml is:
[package]
name = "test_arrow"
version = "0.1.0"
edition = "2018"
[dependencies]
arrow = "5.0.0"
datafusion = "4.0.0"
tokio = "1.8.2"
// compilation can not find this:
use arrow::util::pretty::print_batches;
// also this fails to compile:
let pretty_results = arrow::util::pretty::pretty_format_batches(&results)?;
Error: cannot find 'pretty' in util.
What am I doing wrong please?
thank you very much
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb closed issue #769: arrow::util::pretty::pretty_format_batches missing
Posted by GitBox <gi...@apache.org>.
alamb closed issue #769:
URL: https://github.com/apache/arrow-datafusion/issues/769
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on issue #769: arrow::util::pretty::pretty_format_batches missing
Posted by GitBox <gi...@apache.org>.
alamb commented on issue #769:
URL: https://github.com/apache/arrow-datafusion/issues/769#issuecomment-886036316
> All of those issues should, in theory, disappear when Rust/Arrow/Datafusion/Ballista is running the Spark show
Indeed! I think this is @andygrove 's vision as well.
Thanks for the kind words.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] yuribudilov commented on issue #769: arrow::util::pretty::pretty_format_batches missing
Posted by GitBox <gi...@apache.org>.
yuribudilov commented on issue #769:
URL: https://github.com/apache/arrow-datafusion/issues/769#issuecomment-885967353
I appreciate your support, wonderful and quick!
FWIW - I have used Apache Spark heavily for a couple of years and I am of the opinion that Rust implementation of the great "Spark concept" should be the new ideal for the future. Most of the Spark issues I faced were related to JVM, OO memory overheads, vast memory bloat, many job crashes due to memory exhaustion and GC related issues. The performance often was far from great too. All of those issues should, in theory, disappear when Rust/Arrow/Datafusion/Ballista is running the Spark show. Bring it on. Thank you.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] yuribudilov commented on issue #769: arrow::util::pretty::pretty_format_batches missing
Posted by GitBox <gi...@apache.org>.
yuribudilov commented on issue #769:
URL: https://github.com/apache/arrow-datafusion/issues/769#issuecomment-885302086
OK, I fixed it, thanks to Rust compiler (what a fantastic language!!)
Rust errors "suggested" different version of arrow crate were used.
So I tried using an earlier arrow version in TOML:
arrow = { version = "4.4.0", features = ["prettyprint", "default"] }
This compiles and builds and runs correctly !! Phew! Happy days.
May I humbly suggest there is likely to be a buglet in either datafusion 4.0.0 or in arrow 5.0 or both ?
May I also suggest to update datafusion documentation to list more complete TOML dependencies because those of us who are new to arrow/datafusion but would like to learn could use more help and reliable and accessible documentation is all we have.
Many thanks for reading thus far, it looks like a fantastic product you have been building!
Please feel free to close this issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on issue #769: arrow::util::pretty::pretty_format_batches missing
Posted by GitBox <gi...@apache.org>.
alamb commented on issue #769:
URL: https://github.com/apache/arrow-datafusion/issues/769#issuecomment-885650360
```
arrow = { version = "4.4.0", features = ["prettyprint", "default"] }
```
Yes, this is the version of arrow that the (released) datafusion version 4.0 works with. 👍
The fact that we haven't released a new version of datafusion to crates.io that works with arrow 5 is a problem which we should rectify.
DataFusion (at least on master) also includes a "public export" of its arrow dependency, so perhaps we should change the example from
```rust
use arrow::record_batch::RecordBatch;
use arrow::util::pretty::print_batches;
```
to
```rust
use datafusion::arrow::record_batch::RecordBatch;
use datafusion::arrow::util::pretty::print_batches;
```
> Many thanks for reading thus far, it looks like a fantastic product you have been building!
Thanks! Kudos go to the whole team (there are many people whose work goes into making it)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb edited a comment on issue #769: arrow::util::pretty::pretty_format_batches missing
Posted by GitBox <gi...@apache.org>.
alamb edited a comment on issue #769:
URL: https://github.com/apache/arrow-datafusion/issues/769#issuecomment-884985245
Hi @yuribudilov -- you need to enable the "prettyprint" feature for arrow.
So instead of
```toml
arrow = "5.0.0"
```
try using
```toml
arrow = { version = "5.0", features = ["prettyprint"] }
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on issue #769: arrow::util::pretty::pretty_format_batches missing
Posted by GitBox <gi...@apache.org>.
alamb commented on issue #769:
URL: https://github.com/apache/arrow-datafusion/issues/769#issuecomment-885887095
I made https://github.com/apache/arrow-datafusion/pull/772 to try and improve the docs a little bit
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] yuribudilov commented on issue #769: arrow::util::pretty::pretty_format_batches missing
Posted by GitBox <gi...@apache.org>.
yuribudilov commented on issue #769:
URL: https://github.com/apache/arrow-datafusion/issues/769#issuecomment-885293375
thank you.
One compilation error is now gone but replaced by another 2 compilation errors, one step forward, two steps back.
Repro:
on https://github.com/apache/arrow-datafusion there is Rust code sample given (quote), which does not compile:
use arrow::record_batch::RecordBatch;
use arrow::util::pretty::print_batches;
use datafusion::prelude::*;
#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
// register the table
let mut ctx = ExecutionContext::new();
ctx.register_csv("example", "tests/example.csv", CsvReadOptions::new())?;
// create a plan to run a SQL query
let df = ctx.sql("SELECT a, MIN(b) FROM example GROUP BY a LIMIT 100")?;
// execute and print results
let results: Vec<RecordBatch> = df.collect().await?; // error 1 here
print_batches(&results)?; // error 2 here
Ok(())
}
The TOML on the link only shows one line: datafusion = "4.0.0-SNAPSHOT"
This TOML does not work because there is no arrow and no tokio dependency in TOML.
So I added those myself.
Here is what I have now, which still does not work:
[package]
name = "test_arrow"
version = "0.1.0"
edition = "2018"
[dependencies]
# arrow = "5.0.0"
datafusion = "4.0.0"
tokio = "1.8.2"
arrow = { version = "5.0", features = ["prettyprint"] }
I still have 2 compilation errors based on above:
15 | let results: Vec<RecordBatch> = df.collect().await?; // error 1
| ^^^^^^^^^^^^^^^^^^^ expected struct `arrow::record_batch::RecordBatch`, found a different struct `arrow::record_batch::RecordBatch`
|
= note: expected struct `Vec<arrow::record_batch::RecordBatch>` (struct `arrow::record_batch::RecordBatch`)
found struct `Vec<arrow::record_batch::RecordBatch>` (struct `arrow::record_batch::RecordBatch`)
= note: perhaps two different versions of crate `arrow` are being used?
note: return type inferred to be `Vec<arrow::record_batch::RecordBatch>` here
--> src\main.rs:9:5
|
9 | ctx.register_csv("example", "tests/example.csv", CsvReadOptions::new())?;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
error[E0277]: `?` couldn't convert the error to `DataFusionError`
--> src\main.rs:16:28
|
16 | print_batches(&results)?; // error 2
| ^ the trait `From<arrow::error::ArrowError>` is not implemented for `DataFusionError`
|
= note: the question mark operation (`?`) implicitly performs a conversion on the error value using the `From` trait
= help: the following implementations were found:
<DataFusionError as From<arrow::error::ArrowError>>
<DataFusionError as From<parquet::errors::ParquetError>>
<DataFusionError as From<sqlparser::parser::ParserError>>
<DataFusionError as From<std::io::Error>>
= note: required by `from`
error: aborting due to 2 previous errors
First one can be "covered up" by letting Rust infer data type like so (which is very odd given it infers the same Vec<RecordBatch> !
let results = df.collect().await?;
The second error indicated something is wrong with TOML documentation:
print_batches(&results)?;
Can you please point me to documentation how to use this product from Rust?
Many thanks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on issue #769: arrow::util::pretty::pretty_format_batches missing
Posted by GitBox <gi...@apache.org>.
alamb commented on issue #769:
URL: https://github.com/apache/arrow-datafusion/issues/769#issuecomment-884985245
Hi @yuribudilov -- you need to enable the "pretty" feature for arrow.
So instead of
```toml
arrow = "5.0.0"
```
try using
```toml
arrow = { version = "5.0", features = ["prettyprint"] }
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org