You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "GaoLun (JIRA)" <ji...@apache.org> on 2015/09/02 09:10:46 UTC

[jira] [Updated] (FLINK-2535) Fixed size sample algorithm optimization

     [ https://issues.apache.org/jira/browse/FLINK-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

GaoLun updated FLINK-2535:
--------------------------
    Attachment: sampling.png

Statistical data of rejected items' number with SRS & SSRS.

> Fixed size sample algorithm optimization
> ----------------------------------------
>
>                 Key: FLINK-2535
>                 URL: https://issues.apache.org/jira/browse/FLINK-2535
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Chengxiang Li
>            Priority: Minor
>         Attachments: sampling.png
>
>
> Fixed size sample algorithm is known to be less efficient than sample algorithms with fraction, but sometime it's necessary. Some optimization could significantly reduce the storage size and computation cost, such as the algorithm described in [this paper|http://machinelearning.wustl.edu/mlpapers/papers/icml2013_meng13a].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)