You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Ankur Goenka (Jira)" <ji...@apache.org> on 2021/07/15 18:16:00 UTC

[jira] [Commented] (BEAM-12531) ib.show does not handle deferred dataframe instances

    [ https://issues.apache.org/jira/browse/BEAM-12531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17381518#comment-17381518 ] 

Ankur Goenka commented on BEAM-12531:
-------------------------------------

Is this a blocker for 2.32 release.

If not, Please change the fix version.

> ib.show does not handle deferred dataframe instances
> ----------------------------------------------------
>
>                 Key: BEAM-12531
>                 URL: https://issues.apache.org/jira/browse/BEAM-12531
>             Project: Beam
>          Issue Type: Bug
>          Components: dsl-dataframe
>    Affects Versions: 2.31.0
>            Reporter: Brian Hulette
>            Assignee: Sam Rohde
>            Priority: P2
>             Fix For: 2.32.0
>
>          Time Spent: 6h
>  Remaining Estimate: 0h
>
> When passed a deferred dataframe instance (e.g. {{ib.show(counts.nlargest(20, keep='all'))}}), ib.show calls len() and ends up raising a WontImplementError:
> {code}
> ---------------------------------------------------------------------------
> WontImplementError                        Traceback (most recent call last)
> <ipython-input-9-56c2dd81898d> in <module>
> ----> 1 ib.show(counts.nlargest(20, keep='all'))
> 2 frames
> /usr/local/lib/python3.7/dist-packages/apache_beam/runners/interactive/utils.py in run_within_progress_indicator(*args, **kwargs)
>     245   def run_within_progress_indicator(*args, **kwargs):
>     246     with ProgressIndicator('Processing...', 'Done.'):
> --> 247       return func(*args, **kwargs)
>     248 
>     249   return run_within_progress_indicator
> /usr/local/lib/python3.7/dist-packages/apache_beam/runners/interactive/interactive_beam.py in show(include_window_info, visualize_data, n, duration, *pcolls)
>     441     else:
>     442       try:
> --> 443         flatten_pcolls.extend(iter(pcoll_container))
>     444       except TypeError:
>     445         raise ValueError(
> /usr/local/lib/python3.7/dist-packages/apache_beam/dataframe/frames.py in __len__(self)
>     695         "len(df) is not currently supported because it produces a non-deferred "
>     696         "result. Consider using df.length() instead.",
> --> 697         reason="non-deferred-result")
>     698 
>     699   @property  # type: ignore
> WontImplementError: len(df) is not currently supported because it produces a non-deferred result. Consider using df.length() instead.
> For more information see https://s.apache.org/dataframe-non-deferred-result.
> {code}
> We should support this case, or at least fail gracefully.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)