You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Thomas Mock (Jira)" <ji...@apache.org> on 2022/06/07 13:46:00 UTC

[jira] [Created] (ARROW-16777) printing data in Table/RecordBatch print method

Thomas Mock created ARROW-16777:
-----------------------------------

             Summary: printing data in Table/RecordBatch print method
                 Key: ARROW-16777
                 URL: https://issues.apache.org/jira/browse/ARROW-16777
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python, R
            Reporter: Thomas Mock


Related to ARROW-16776 but after a brief discussion with Neal Richardson, he requested that I split the improvement request into separate issues.

When working with Arrow datasets/tables, I often find myself wanting to interactively print or "see" the results of a query or the first few rows of the data without having to fully collect into memory. 

It would be ideal to lazily print some data with Table/RecordBatch print methods, however, currently, the print methods return schema without data. 

IE:

``` r
library(dplyr)
library(arrow)

mtcars %>% arrow::write_parquet("mtcars.parquet")
car_ds <- arrow::open_dataset("mtcars.parquet")

car_ds
#> FileSystemDataset with 1 Parquet file
#> mpg: double
#> cyl: double
#> disp: double
#> hp: double
#> drat: double
#> wt: double
#> qsec: double
#> vs: double
#> am: double
#> gear: double
#> carb: double
#> 
#> See $metadata for additional Schema metadata

car_ds %>%
  compute()
#> Table
#> 32 rows x 11 columns
#> $mpg <double>
#> $cyl <double>
#> $disp <double>
#> $hp <double>
#> $drat <double>
#> $wt <double>
#> $qsec <double>
#> $vs <double>
#> $am <double>
#> $gear <double>
#> $carb <double>
#> 
#> See $metadata for additional Schema metadata
```





--
This message was sent by Atlassian Jira
(v8.20.7#820007)