You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Kenneth Knowles (JIRA)" <ji...@apache.org> on 2018/07/02 21:57:00 UTC

[jira] [Created] (BEAM-4719) Enhanced LIMIT support

Kenneth Knowles created BEAM-4719:
-------------------------------------

             Summary: Enhanced LIMIT support
                 Key: BEAM-4719
                 URL: https://issues.apache.org/jira/browse/BEAM-4719
             Project: Beam
          Issue Type: New Feature
          Components: dsl-sql
            Reporter: Kenneth Knowles


Currently, Beam SQL supports LIMIT in two ways:

1. Within a query, the results are subject to LIMIT. This works.
2. The shell knows to cancel a pipeline when the limit is reached, even if there is unfinished unbounded data.

The canceling of a pipeline works via a basic pattern match against the query execution plan, checking a few child nodes of the BeamEnumerableConverter for a BeamSortRel without a collation. If it can figure out what the limit is for the outermost query, then it will cancel the pipeline.

A more robust approach might be to use traits (or some other thorough analysis) to see if there is a known size for the outermost query. This would, for example, be unaffected by any number of layer of non-size-changing transformations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)