You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dewey Dunnington (Jira)" <ji...@apache.org> on 2022/12/12 13:59:00 UTC

[jira] [Commented] (ARROW-18359) PrettyPrint Improvements

    [ https://issues.apache.org/jira/browse/ARROW-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646112#comment-17646112 ] 

Dewey Dunnington commented on ARROW-18359:
------------------------------------------

I'm not sure if this is covered by one of the subtasks, but really huge binary arrays take forever to print...I am guessing because it tries to convert the entire binary array to a string before selecting the few characters that will actually be shown:

{code:R}
library(arrow)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp

really_big_raw <- raw(1e9)
really_big_binary <- Array$create(list(really_big_raw), type = binary())
system.time(really_big_binary$ToString())
#>    user  system elapsed 
#>  12.396   1.660  14.269
{code}


(I ran into that one because the current encoding for geospatial data in Parquet files is {{binary()}} and the elements can be huge)

> PrettyPrint Improvements
> ------------------------
>
>                 Key: ARROW-18359
>                 URL: https://issues.apache.org/jira/browse/ARROW-18359
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python, R
>            Reporter: Will Jones
>            Priority: Major
>
> We have some pretty printing capabilities, but we may want to think at a high level about the design first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)