You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Gabor Gevay (JIRA)" <ji...@apache.org> on 2016/05/16 12:40:12 UTC

[jira] [Commented] (FLINK-2142) GSoC project: Exact and Approximate Statistics for Data Streams and Windows

    [ https://issues.apache.org/jira/browse/FLINK-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284435#comment-15284435 ] 

Gabor Gevay commented on FLINK-2142:
------------------------------------

This proposal was based on the old (pre-0.10) windowing API. I'm now taking it apart, by converting sub-tasks to stand-alone issues (FLINK-2148, FLINK-2147) and/or modifying/closing those sub-tasks that don't make sense in the current streaming API. I will add the label `approximate` to those issues that are about approximate calculations.

Note: The main reason why I abandoned this project last summer, is that the streaming API was changing a lot at that time, so it seemed better to postpone these things.

> GSoC project: Exact and Approximate Statistics for Data Streams and Windows
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-2142
>                 URL: https://issues.apache.org/jira/browse/FLINK-2142
>             Project: Flink
>          Issue Type: New Feature
>          Components: Streaming
>            Reporter: Gabor Gevay
>            Assignee: Gabor Gevay
>            Priority: Minor
>              Labels: gsoc2015, statistics, streaming
>
> The goal of this project is to implement basic statistics of data streams and windows (like average, median, variance, correlation, etc.) in a computationally efficient manner. This involves designing custom PreReducers.
> The exact calculation of some statistics (eg. frequencies, or the number of distinct elements) would require memory proportional to the number of elements in the input (the window or the entire stream). However, there are efficient algorithms and data structures using less memory for calculating the same statistics only approximately, with user-specified error bounds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)