You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/18 17:12:25 UTC
[GitHub] [arrow] wjones127 commented on a change in pull request #12148: ARROW-15329: [Python] Add character limit to Table.to_string()
wjones127 commented on a change in pull request #12148:
URL: https://github.com/apache/arrow/pull/12148#discussion_r786980475
##########
File path: python/pyarrow/table.pxi
##########
@@ -1342,10 +1344,11 @@ cdef class Table(_PandasConvertible):
if preview_cols:
pieces.append('----')
for i in range(min(self.num_columns, preview_cols)):
- pieces.append('{}: {}'.format(
- self.field(i).name,
- self.column(i).to_string(indent=0, skip_new_lines=True)
- ))
+ col_string = self.column(i).to_string(
+ indent=0, skip_new_lines=True)
+ if len(col_string) > cols_char_limit:
+ col_string = col_string[:(cols_char_limit - 3)] + '...'
+ pieces.append('{}: {}'.format(self.field(i).name, col_string))
Review comment:
Thanks for the feedback. I implemented a very basic version of this for now. This looks pretty good for this example:
```python
>>> from random import sample, choice
>>> import pyarrow as pa
>>> arr_int = pa.array(range(50))
>>> tree_parts = ["roots", "trunk", "crown", "seeds"]
>>> arr_list = pa.array([sample(tree_parts, k=choice(range(len(tree_parts)))) for _ in range(50)])
>>> arr_struct = pa.StructArray.from_arrays([arr_int, arr_list], names=['int_nested', 'list_nested'])
>>> arr_map = pa.array(
... [
... [(part, choice(range(10))) for part in sample(tree_parts, k=choice(range(len(tree_parts))))]
... for _ in range(50)
... ],
... type=pa.map_(pa.utf8(), pa.int64())
... )
>>> table = pa.table({
... 'int': pa.chunked_array([arr_int] * 10),
... 'list': pa.chunked_array([arr_list] * 10),
... 'struct': pa.chunked_array([arr_struct] * 10),
... 'map': pa.chunked_array([arr_map] * 10),
... })
>>> print(table)
pyarrow.Table
int: int64
list: list<item: string>
child 0, item: string
struct: struct<int_nested: int64, list_nested: list<item: string>>
child 0, int_nested: int64
child 1, list_nested: list<item: string>
child 0, item: string
map: map<string, int64>
child 0, entries: struct<key: string not null, value: int64> not null
child 0, key: string not null
child 1, value: int64
----
int: [[0,1,2,3,4,5,6,7,8,9,...,40,41,42,43,44,45,46,47,48,49],[0,1,2,3,4,5,6,7,8,9,...,40,41,42,43,44,45,46,47,48,49],[0,1,2,3,...]...]
list: [[["seeds","trunk","roots"],["trunk","crown"],["crown"],["trunk"],["crown"],[],["roots","seeds"],["roots"],["trunk","roots"]...]...]
struct: [ -- is_valid: all not null -- child 0 type: int64
[
0,
1,
2,
3,
4,
5,
6,...]...]
map: [[ keys:["seeds","crown","trunk"]values:[7,8,7], keys:["roots","crown"]values:[8,4], keys:["crown","roots","trunk"]...]...]
```
The unfortunate thing is it will have bad behavior in the case of string columns containing `[`. For example,
```python
>>> pa.table({'x': pa.array(["[" * 100]* 500)})
x: [["[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[","[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[",...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]
...]...]...]...]...]...]
```
I think that kind of behavior is pretty unavoidable until we push this limit into the PrettyPrinter implementation itself.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org