You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by gi...@git.apache.org on 2017/08/07 18:39:43 UTC

[GitHub] LakeCarrot opened a new issue #7341: Usage of Tensorboard in Distributed MXNet

LakeCarrot opened a new issue #7341: Usage of Tensorboard in Distributed MXNet
URL: https://github.com/apache/incubator-mxnet/issues/7341

Hi all,
I tried to use Tensorboard to visualize my model training process. In the single-node training mode, the usage of Tensorboard is straightforward. Thing is different when it comes to the distributed training mode. Suppose I have 2 servers and 4 workers in my cluster, how can I use Tensorboard to track the overall training process? Basically, I can imagine there will be 4 different set of log files locate in each worker, and I need to use 4 separate Tensorboard processes to visualize the whole process.
After some research, I found the following question on StackOverflow, which said that in TensorFlow, only one of the workers need to write the log.
https://stackoverflow.com/questions/37411005/unable-to-use-tensorboard-in-distributed-tensorflow
I wonder what is the by-design usage of Tensorboard in Distributed MXNet? My main concern of writing summary on one of the worker is whether the log from a single worker can be a good representative to the overall learning process.
@zihaolucky Thanks a lot for your work to make the Tensorboard on MXNet come true. I wonder do you have any idea of my question?
Thanks in advance!
Bo

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services