You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/06/10 14:53:01 UTC

[jira] [Commented] (FLINK-2145) Median calculation for windows

    [ https://issues.apache.org/jira/browse/FLINK-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14580489#comment-14580489 ] 

ASF GitHub Bot commented on FLINK-2145:
---------------------------------------

Github user ggevay commented on the pull request:

    https://github.com/apache/flink/pull/684#issuecomment-110739389
  
    I did the Scala part, and added a Jira (FLINK-2145).
    As for the placement:
    @StephanEwen:
    > I think we should put the code into a streaming operator library. The streaming core is undergoing heavy changes, the windowing may be reworked in the near future, and there are plans to use managed memory for window buffer contents.
    
    I am not sure how putting this into a separate library helps a future refactoring of windowing. Most of the median code is inside the new PreReducers, so it is probably already separated from the rest of the code as well as it can be.
    (Also, if you are worried about bloating the API for the users, I don't think median would hurt in that respect either: the function of a method named "median" should be pretty clear to the users.)


> Median calculation for windows
> ------------------------------
>
>                 Key: FLINK-2145
>                 URL: https://issues.apache.org/jira/browse/FLINK-2145
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Streaming
>            Reporter: Gabor Gevay
>            Assignee: Gabor Gevay
>            Priority: Minor
>              Labels: statistics
>
> The PreReducer for this has the following algorithm: We maintain two multisets (as, for example, balanced binary search trees), that always partition the elements of the current window to smaller-than-median and larger-than-median elements. At each store and evict, we can maintain this invariant with only O(1) multiset operations.
> Store: O(log N)
> Evict: O(log N)
> emitWindow: O(1)
> memory: O(N)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)