You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2019/02/11 22:25:11 UTC

[GitHub] walterddr opened a new pull request #7678: [FLINK-11453][DataStreamAPI] Support SliceStream with forwardable pane info using slice assigner, operator and stream

walterddr opened a new pull request #7678: [FLINK-11453][DataStreamAPI] Support SliceStream with forwardable pane info using slice assigner, operator and stream
URL: https://github.com/apache/flink/pull/7678
 
 
   
   
   
   **(The sections below can be removed for hotfixes of typos)**
   -->
   
   ## What is the purpose of the change
   
   This PR introduces a new `SlicedStream` abstract operation, which creates a resulting stream of the intermediate results buffered in the internal state of `WindowedOperator`. 
   It creates a `Slice` data type as a result to contain all necessary information of a pane slice.
   With this API. further processing is possible for operations:
   ```
   val slicedStream: slicedStream = inputStream
     .keyBy("key")
     .sliceWindow(Time.seconds(5L)) 
     .aggregate(aggFunc)
   val resultStream = slicedStream
       .window(Time.seconds(5000L))
       .aggregate(aggFunc)
   ```
   Is possible to create much more efficient sliding window operation, where elements won't have to be duplicated into each window.
   
   ## Brief change log
   
     - Added `SliceAssigner` that assigns elements into zero or one window (a.k.a. the "slice")
     - Modified `KeyedStream` and `WindowedStream` API to incorporate the creation of `SlicedStream`
     - Added `SlicedStream` concept that can emit slicing results.
     - Created special operators `SliceOperator` and `IterableSliceOperator` to process the intermediate results.
     - Added in TumblingEvent/ProcessingTimeSliceAssigner as an example.
   
   
   ## Verifying this change
   
   - This change is already covered by multiple tests for backward compatibility 
   - This change added tests and can be verified as follows:
     - Added integration tests for end-to-end processing for reduce, aggregation, and general apply
     - Added translation tests in scala for verifying `SlicedStream` API conversion chained with `KeyedStream`.
     - Added integration tests specifically tested serialization and deserialization to/from state snapshot.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): yes
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? yes
     - If yes, how is the feature documented? not yet, await review
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services