You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by mizitch <gi...@git.apache.org> on 2016/10/26 22:54:08 UTC

[GitHub] incubator-beam pull request #1199: [BEAM-840] Add Java SDK extension to supp...

GitHub user mizitch opened a pull request:

    https://github.com/apache/incubator-beam/pull/1199

    [BEAM-840] Add Java SDK extension to support non-distributed sorting

    Be sure to do all of the following to help us incorporate your contribution
    quickly and easily:
    
     - [x] Make sure the PR title is formatted like:
       `[BEAM-<Jira issue #>] Description of pull request`
     - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
           Travis-CI on your fork and ensure the whole test matrix passes).
     - [x] Replace `<Jira issue #>` in the title with the actual Jira issue
           number, if there is one.
     - [x] If this contribution is large, please file an Apache
           [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). (covered by Google's existing agreement with Apache)
    
    ---
    
    Add an extension that provides a PTransform which performs local(non-distributed) sorting. It will sort in memory until the buffer is full, then flush to disk and use external sorting.
    
    Consumes a PCollection of KVs from primary key to iterable of secondary key and value KVs and sorts the iterables. Would probably be called after a GroupByKey. Uses coders to convert secondary keys and values into byte arrays and does a lexicographical comparison on the secondary keys.
    
    Uses Hadoop as an external sorting library.
    
    Hi @dhalperi can you please take a look?
    
    https://issues.apache.org/jira/browse/BEAM-840

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mizitch/incubator-beam sorter-extension

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-beam/pull/1199.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1199
    
----
commit e18fd4080b9331953c6cd8ba1fa509cfe56c787b
Author: Mitch Shanklin <ms...@google.com>
Date:   2016-10-25T23:17:01Z

    Add an extension that provides a PTransform which performs local(non-distributed) sorting. It will sort in memory until the buffer is full, then flush to disk and use external sorting.
    
    Consumes a PCollection of KVs from primary key to iterable of secondary key and value KVs and sorts the iterables. Would probably be called after a GroupByKey. Uses coders to convert secondary keys and values into byte arrays and does a lexicographical comparison on the secondary keys.
    
    Uses Hadoop as an external sorting library.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-beam pull request #1199: [BEAM-840] Add Java SDK extension to supp...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-beam/pull/1199


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---