You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by dhalperi <gi...@git.apache.org> on 2016/10/25 03:21:46 UTC

[GitHub] incubator-beam pull request #1184: [BEAM-814] Dataflow: parallelize hashing/...

GitHub user dhalperi opened a pull request:

    https://github.com/apache/incubator-beam/pull/1184

    [BEAM-814] Dataflow: parallelize hashing/compressing of files to stage

    R: @lukecwik 
    
    This is a WIP for the linked issue. Early feedback appreciated.
    
    Decided to parallelize just hashing/compressing separately from uploading, because this should use a very tiny buffer per file and so can be parallelized nearly arbitrarily. In contrast, uploading uses a very large buffer and cannot be parallelized simply.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dhalperi/incubator-beam dataflow-runner-speedup

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-beam/pull/1184.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1184
    
----
commit 04c7e1cc8596a7be8496e8f7e1e186e727366409
Author: Dan Halperin <dh...@google.com>
Date:   2016-10-25T00:27:23Z

    PackageUtil: remove deprecated, unused function

commit b625e840b78a8dd5686be66bf949b8e42dba42db
Author: Dan Halperin <dh...@google.com>
Date:   2016-10-25T02:20:53Z

    PackageUtil: parallelize hashing of files

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---