You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Will Jones (Jira)" <ji...@apache.org> on 2022/01/07 21:54:00 UTC

[jira] [Commented] (ARROW-14798) [Python] Limit the size of the repr for large Tables

    [ https://issues.apache.org/jira/browse/ARROW-14798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470902#comment-17470902 ] 

Will Jones commented on ARROW-14798:
------------------------------------

The thing about inner window is it only helps at one level. Another way to think about the problem is that we care more about seeing primitive elements than about container elements. So we could instead have {{container_window}} and {{{}window{}}}, where the latter only applies to primitives.

So the {{window}} and {{inner_window}} would work like:
{code:python}
// window: 1, inner_window: 3
>>> pa.chunked_array([[list(range(10))] * 10] * 10)# 10 chunks, 10 elements, 10 list-elements
[[[0, 1, 2,...,7, 8, 9], [0, 1, 2,...,7, 8, 9], [0, 1, 2,...,7, 8, 9],...,[0, 1, 2,...,7, 8, 9], [0, 1, 2,...,7, 8, 9], [0, 1, 2,...,7, 8, 9]],
 ...,
[[0, 1, 2,...,7, 8, 9], [0, 1, 2,...,7, 8, 9], [0, 1, 2,...,7, 8, 9],...,[0, 1, 2,...,7, 8, 9], [0, 1, 2,...,7, 8, 9], [0, 1, 2,...,7, 8, 9]]]
// container_window: 1, window: 4
>>> pa.chunked_array([[['a'] * 10] * 10] * 10) # 10 chunks, 10 elements, 10 list-elements
[[[0, 1, 2, 3, ..., 6, 7, 8, 9], ...,[0, 1, 2, 3, ..., 6, 7, 8, 9]], ..., [[[0, 1, 2, 3, ..., 6, 7, 8, 9], ...,[0, 1, 2, 3, ..., 6, 7, 8, 9]]]
{code}

> [Python] Limit the size of the repr for large Tables
> ----------------------------------------------------
>
>                 Key: ARROW-14798
>                 URL: https://issues.apache.org/jira/browse/ARROW-14798
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>            Reporter: Joris Van den Bossche
>            Assignee: Will Jones
>            Priority: Major
>              Labels: good-first-issue, pull-request-available
>             Fix For: 8.0.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The new repr is nice that it shows a preview of the data, but this can also become very long flooding your console output for larger tables.
> We already default to 10 preview cols, but each column can still consist of many chunks. So it might be good to also limit it to 2 chunks? 
> The ChunkedArray.to_string method already has a {{window}} keyword, but that seems to control both the number of elements to show per chunk as the number of chunks (while it would be nice to limit eg to 2 chunks but show up to 10 elements for each chunk).
> cc [~amol-]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)