You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by ewencp <gi...@git.apache.org> on 2017/06/20 06:23:19 UTC

[GitHub] kafka pull request #3380: MINOR: Add serialized vagrant rsync until upstream...

GitHub user ewencp opened a pull request:

    https://github.com/apache/kafka/pull/3380

    MINOR: Add serialized vagrant rsync until upstream fixes broken parallelism

    See https://github.com/mitchellh/vagrant/issues/7531. The core of the issue is that vagrant rsync uses a fixed set of 1000 possible temp file entries for SSH ControlMaster files to cache SSH connections for rsyncing. A few notes:
    
    * We can't break down the steps further and maintain performance due to various limitations in vagrant/vagrant-aws (rsync is only executed on `vagrant up`/`vagrant reload`/`vagrant rsync`, you can't enable/disable and rsync shared folder only during some of those stages, and provisioning only runs in parallel with vagrant-aws during `vagrant up`).
    * We need to isolate each of the serialized rsync calls. (If we assumed `parallel` was available, we actually could get the parallelism back.) This is required because even across calls they could randomly choose the same temporary file.
    * If there's a chance multiple instances were running on the same server at the same or nearly the same time, they can conflict since the same temp file entries are used globally. This means anything running on shared CI servers might end up syncing data between different CI jobs (!!), which could lead to some very strange results. Especially weird if they aren't even for the same type of job.
    * Provisioning error check needs to be removed because it is catching rsync errors, but those can still happen in the initial `vagrant up` rsync step before the `vagrant up` provisioning step. It seems likely this bug was the cause of missing files anyway so this check might not be as valuable anymore.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ewencp/kafka deparallelize-rsync

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/3380.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3380
    
----
commit 96d68f6141d3b84d6c8622f3212e29ee2d0db726
Author: Ewen Cheslack-Postava <me...@ewencp.org>
Date:   2017-06-20T01:58:32Z

    MINOR: Add serialized vagrant rsync until upstream fixes broken parallelism
    
    See https://github.com/mitchellh/vagrant/issues/7531

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] kafka pull request #3380: MINOR: Add serialized vagrant rsync until upstream...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/kafka/pull/3380


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---