You are viewing a plain text version of this content. The canonical link for it is here.

Posted to discuss-archive@mxnet.apache.org by MaxiBoether via MXNet Forum <mx...@discoursemail.com.INVALID> on 2021/01/20 09:18:39 UTC

[MXNet Forum] Synchronization and Update Function of Parameter Server (distributed KV-Store)


Hi,

I've got two questions about the distributed KV-store (`mx.kv.create('dist_sync')`)  available in MXNet.

1) The documentation states that pushes return immediatly and are executed asynchronously. Then it states we can use `_barrier()` to sync all workers. This however is discussed in the context of asynchronous execution. 

I do not fully understand what "syncing all workers" means here. Workers are not parameter servers (schedulers or servers); instead, they are the one executing the control flow. So, if I'm not training a neural network, but just using the parameter server (which is what I want to do for testing purposes), the worker is the one calling `kv.push`. How can the server then sync all workers? The server cannot push directly to the worker, can it? The worker can only pull from or push to the server. I imagine that calling _barrier() force-pulls at all clients, but imagine we do something like in the documentation:

```
        >>> # push a list of keys.
        >>> # single device
        >>> keys = ['4', '5', '6']
        >>> kv.push(keys, [mx.nd.ones(shape)]*len(keys))
        >>> b = [mx.nd.zeros(shape)]*len(keys)
        >>> kv.pull(keys, out=b)
        >>> print b[1].asnumpy()
        [[ 1.  1.  1.]
        [ 1.  1.  1.]]
```

I do not see how the kvstore knows which variable to update on all workers when _barrier() is called.

Secondly, I wonder about push guarantees. Is there also a barrier that forces completion of all pending operations? Because again, _barrier() seems to be about workers, not servers. I want to benchmark parameter server primitives and therefore, I need to know that an operation is finished.

2) We can use `set_updater` locally, but that will have no effect on the aggregation of the workers. However, there is the `set_optimizer` function, but that is on another level of abstraction. If I want to define my own update function like in the documentation (https://mxnet.apache.org/versions/1.7.0/api/python/docs/tutorials/packages/kvstore/kvstore.html?highlight=distributed):

```
def update(key, input, stored):
    print("update on key: %d" % key)
    stored += input * 2
```

I can't figure out how to use this in the distributed KV-store context, as `kv._set_updater(update)` only works for local KV-stores, but `set_optimizer` required inbuilt optimizers like SGD.

Thank you very much for your help!

Best,
Maximilian





---
[Visit Topic](https://discuss.mxnet.apache.org/t/synchronization-and-update-function-of-parameter-server-distributed-kv-store/6823/1) or reply to this email to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.mxnet.apache.org/email/unsubscribe/b3057be9d6ca28435a5107e7df54a2a367d6b938f840f4addba1e5be8312d83d).