You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/10/26 22:54:59 UTC
[jira] [Commented] (BEAM-840) Add Java SDK extension to support
non-distributed sorting
[ https://issues.apache.org/jira/browse/BEAM-840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609942#comment-15609942 ]
ASF GitHub Bot commented on BEAM-840:
-------------------------------------
GitHub user mizitch opened a pull request:
https://github.com/apache/incubator-beam/pull/1199
[BEAM-840] Add Java SDK extension to support non-distributed sorting
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
- [x] Make sure the PR title is formatted like:
`[BEAM-<Jira issue #>] Description of pull request`
- [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
- [x] Replace `<Jira issue #>` in the title with the actual Jira issue
number, if there is one.
- [x] If this contribution is large, please file an Apache
[Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). (covered by Google's existing agreement with Apache)
---
Add an extension that provides a PTransform which performs local(non-distributed) sorting. It will sort in memory until the buffer is full, then flush to disk and use external sorting.
Consumes a PCollection of KVs from primary key to iterable of secondary key and value KVs and sorts the iterables. Would probably be called after a GroupByKey. Uses coders to convert secondary keys and values into byte arrays and does a lexicographical comparison on the secondary keys.
Uses Hadoop as an external sorting library.
Hi @dhalperi can you please take a look?
https://issues.apache.org/jira/browse/BEAM-840
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mizitch/incubator-beam sorter-extension
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-beam/pull/1199.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1199
----
commit e18fd4080b9331953c6cd8ba1fa509cfe56c787b
Author: Mitch Shanklin <ms...@google.com>
Date: 2016-10-25T23:17:01Z
Add an extension that provides a PTransform which performs local(non-distributed) sorting. It will sort in memory until the buffer is full, then flush to disk and use external sorting.
Consumes a PCollection of KVs from primary key to iterable of secondary key and value KVs and sorts the iterables. Would probably be called after a GroupByKey. Uses coders to convert secondary keys and values into byte arrays and does a lexicographical comparison on the secondary keys.
Uses Hadoop as an external sorting library.
----
> Add Java SDK extension to support non-distributed sorting
> ---------------------------------------------------------
>
> Key: BEAM-840
> URL: https://issues.apache.org/jira/browse/BEAM-840
> Project: Beam
> Issue Type: New Feature
> Components: sdk-java-extensions
> Affects Versions: 0.4.0-incubating
> Reporter: Mitch Shanklin
> Assignee: Mitch Shanklin
> Priority: Minor
>
> Add an extension that provides a PTransform which performs local(non-distributed) sorting. It will sort in memory until the buffer is full, then flush to disk and use external sorting.
>
> Consumes a PCollection of KVs from primary key to iterable of secondary key and value KVs and sorts the iterables. Would probably be called after a GroupByKey. Uses coders to convert secondary keys and values into byte arrays and does a lexicographical comparison on the secondary keys.
> Uses Hadoop as an external sorting library.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)