You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/07/18 17:09:21 UTC

[jira] [Commented] (FLINK-4205) Implement stratified sampling for DataSet

    [ https://issues.apache.org/jira/browse/FLINK-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382615#comment-15382615 ] 

ASF GitHub Bot commented on FLINK-4205:
---------------------------------------

GitHub user doflink opened a pull request:

    https://github.com/apache/flink/pull/2267

    [FLINK-4205] Create a simple stratified sampling function for DataSet

    Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
    If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html).
    In addition to going through the list, please provide a meaningful description of your changes.
    
    - [ ] General
      - The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text")
      - The pull request addresses only one issue
      - Each commit in the PR has a meaningful commit message (including the JIRA id)
    
    - [ ] Documentation
      - Documentation has been added for new functionality
      - Old documentation affected by the pull request has been updated
      - JavaDoc for public methods has been added
    
    - [ ] Tests & Build
      - Functionality added by the pull request is covered by tests
      - `mvn clean verify` has been executed successfully locally or a Travis build has passed


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/doflink/flink simple-stratified-sampling

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2267.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2267
    
----
commit 6c3e953f4e2add830b35d212522bf614f402ee9a
Author: Le Quoc Do <le...@gmail.com>
Date:   2016-07-18T13:59:50Z

    create stratified sampling function for DataSet

commit 5a5371ee588d13c45cdbb8a73314ba10ba928888
Author: Le Quoc Do <le...@gmail.com>
Date:   2016-07-18T16:14:17Z

    add approved license for RatCheck

commit 4a111f2be14dc4d0abc917d7837403d1a8bacb32
Author: Le Quoc Do <le...@gmail.com>
Date:   2016-07-18T16:49:44Z

    update Java docs

----


> Implement stratified sampling for DataSet
> -----------------------------------------
>
>                 Key: FLINK-4205
>                 URL: https://issues.apache.org/jira/browse/FLINK-4205
>             Project: Flink
>          Issue Type: New Feature
>            Reporter: Do Le Quoc
>
> Since a Dataset might consist of data from disparate sources. As such, every data source should be considered fairly to have a representative sample. For this, stratified sampling is needed to ensure that data from every source (stratum) is selected and none of the minorities are excluded. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)