You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Austin Ouyang (JIRA)" <ji...@apache.org> on 2016/04/22 07:41:12 UTC

[jira] [Commented] (FLINK-1284) Uniform random sampling operator over windows

    [ https://issues.apache.org/jira/browse/FLINK-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253383#comment-15253383 ] 

Austin Ouyang commented on FLINK-1284:
--------------------------------------

Hi Paris,

Would we also want to add the ability to sample by percentage? Also what would the fieldID be referring to? I was thinking that there were 2 naive possible solutions. 
1) Once the trigger is made, we randomly sample for N samples or a percentage of all the samples in each window
2) Given a percentage of samples we want to retain from each window generate a random number between 0 and 1. Append to result if the random number is less than the specified percentage. 


> Uniform random sampling operator over windows
> ---------------------------------------------
>
>                 Key: FLINK-1284
>                 URL: https://issues.apache.org/jira/browse/FLINK-1284
>             Project: Flink
>          Issue Type: New Feature
>          Components: Streaming
>            Reporter: Paris Carbone
>            Priority: Minor
>
> It would be useful for several use cases to have a built-in uniform random sampling operator in the streaming API that can operate on windows. This can be used for example for online machine learning operations, evaluating heuristics or continuous visualisation of representative values.
> The operator could be given a field and a number of random samples needed, following a window statement as such:
> mystream.window(..).sample(fieldID,#samples)
> Given that pre-aggregation is enabled, this could perhaps be implemented as a binary reduce operator or a combinable groupreduce that pre-aggregates the empiricals of that field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)