You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/18 17:12:25 UTC

[GitHub] [arrow] wjones127 commented on a change in pull request #12148: ARROW-15329: [Python] Add character limit to Table.to_string()

wjones127 commented on a change in pull request #12148:
URL: https://github.com/apache/arrow/pull/12148#discussion_r786980475



##########
File path: python/pyarrow/table.pxi
##########
@@ -1342,10 +1344,11 @@ cdef class Table(_PandasConvertible):
         if preview_cols:
             pieces.append('----')
             for i in range(min(self.num_columns, preview_cols)):
-                pieces.append('{}: {}'.format(
-                    self.field(i).name,
-                    self.column(i).to_string(indent=0, skip_new_lines=True)
-                ))
+                col_string = self.column(i).to_string(
+                    indent=0, skip_new_lines=True)
+                if len(col_string) > cols_char_limit:
+                    col_string = col_string[:(cols_char_limit - 3)] + '...'
+                pieces.append('{}: {}'.format(self.field(i).name, col_string))

Review comment:
       Thanks for the feedback. I implemented a very basic version of this for now. This looks pretty good for this example:
   
   ```python
   >>> from random import sample, choice
   >>> import pyarrow as pa
   >>> arr_int = pa.array(range(50))
   >>> tree_parts = ["roots", "trunk", "crown", "seeds"]
   >>> arr_list = pa.array([sample(tree_parts, k=choice(range(len(tree_parts)))) for _ in range(50)])
   >>> arr_struct = pa.StructArray.from_arrays([arr_int, arr_list], names=['int_nested', 'list_nested'])
   >>> arr_map = pa.array(
   ...     [
   ...         [(part, choice(range(10))) for part in sample(tree_parts, k=choice(range(len(tree_parts))))]
   ...         for _ in range(50)
   ...     ],
   ...     type=pa.map_(pa.utf8(), pa.int64())
   ... )
   >>> table = pa.table({
   ...     'int': pa.chunked_array([arr_int] * 10),
   ...     'list': pa.chunked_array([arr_list] * 10),
   ...     'struct': pa.chunked_array([arr_struct] * 10),
   ...     'map': pa.chunked_array([arr_map] * 10),
   ... })
   >>> print(table)
   pyarrow.Table
   int: int64
   list: list<item: string>
     child 0, item: string
   struct: struct<int_nested: int64, list_nested: list<item: string>>
     child 0, int_nested: int64
     child 1, list_nested: list<item: string>
         child 0, item: string
   map: map<string, int64>
     child 0, entries: struct<key: string not null, value: int64> not null
         child 0, key: string not null
         child 1, value: int64
   ----
   int: [[0,1,2,3,4,5,6,7,8,9,...,40,41,42,43,44,45,46,47,48,49],[0,1,2,3,4,5,6,7,8,9,...,40,41,42,43,44,45,46,47,48,49],[0,1,2,3,...]...]
   list: [[["seeds","trunk","roots"],["trunk","crown"],["crown"],["trunk"],["crown"],[],["roots","seeds"],["roots"],["trunk","roots"]...]...]
   struct: [  -- is_valid: all not null  -- child 0 type: int64
       [
         0,
         1,
         2,
         3,
         4,
         5,
         6,...]...]
   map: [[    keys:["seeds","crown","trunk"]values:[7,8,7],    keys:["roots","crown"]values:[8,4],    keys:["crown","roots","trunk"]...]...]
   ```
   
   
   The unfortunate thing is it will have bad behavior in the case of string columns containing `[`. For example,
   
   ```python
   >>> pa.table({'x': pa.array(["[" * 100]* 500)})
   x: [["[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[","[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[",...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]...]
 ...]...]...]...]...]...]
   ```
   
   I think that kind of behavior is pretty unavoidable until we push this limit into the PrettyPrinter implementation itself.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org