You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by "Lin Yuan (JIRA)" <ji...@apache.org> on 2018/10/17 17:14:00 UTC
[jira] [Updated] (MXNET-1112) float16 HIERARCHICAL_ALLREDUCE not
working
[ https://issues.apache.org/jira/browse/MXNET-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lin Yuan updated MXNET-1112:
----------------------------
Description:
-Error message: *** An error occurred in MPI_Allreduce: the reduction operation MPI_SUM is not defined for non-intrinsic datatypes
-suspected reason: Horovod MPI using CPU (i.e. non-CUDA aware) does not support float16 MPI_SUM ops
-We are running into an issue with `update_multi_precision` in https://github.com/ctcyang/horovod/blob/0a0240113fe5a24ec2c772fd7309840ba179562a/horovod/mxnet/__init__.py#L47 We don't yet have a way of hooking into SGD's `update_multi_precision` to do the `hvd.allreduce` before weight update and after it is casted to float32. The way it is written now, the `hvd.allreduce` is all-reducing in `float16`, which does not presently support hierarchical allreduce
-this is an issue, because our scalability experiments for 256 GPUs in float32 mode show 68% scalability with HOROVOD_HIERARCHICAL_ALLREDUCE=0, and 92.1% scalability with HOROVOD_HIERARCHICAL_ALLREDUCE=1. If this analogy holds for float16, hierarchical allreduce will be a necessity for getting good scalability
-it is a good idea to rebase with Horovod `master` if you haven't done so already, to take advantage of this new performance-improving feature
-3 possible fixes:
1) add MPI_SUM for float16 and do gradient all-reduce in float16 (may be difficult to converge model)
2) hook into `update_multi_precision` after casting gradient to float32 and before weight update
3) hardcode `hvd.allreduce` here. Problems: Might not be possible for mxnet to import horovod?
> float16 HIERARCHICAL_ALLREDUCE not working
> ------------------------------------------
>
> Key: MXNET-1112
> URL: https://issues.apache.org/jira/browse/MXNET-1112
> Project: Apache MXNet
> Issue Type: Improvement
> Components: Apache MXNet Backend
> Reporter: Lin Yuan
> Priority: Major
>
> -Error message: *** An error occurred in MPI_Allreduce: the reduction operation MPI_SUM is not defined for non-intrinsic datatypes
> -suspected reason: Horovod MPI using CPU (i.e. non-CUDA aware) does not support float16 MPI_SUM ops
> -We are running into an issue with `update_multi_precision` in https://github.com/ctcyang/horovod/blob/0a0240113fe5a24ec2c772fd7309840ba179562a/horovod/mxnet/__init__.py#L47 We don't yet have a way of hooking into SGD's `update_multi_precision` to do the `hvd.allreduce` before weight update and after it is casted to float32. The way it is written now, the `hvd.allreduce` is all-reducing in `float16`, which does not presently support hierarchical allreduce
> -this is an issue, because our scalability experiments for 256 GPUs in float32 mode show 68% scalability with HOROVOD_HIERARCHICAL_ALLREDUCE=0, and 92.1% scalability with HOROVOD_HIERARCHICAL_ALLREDUCE=1. If this analogy holds for float16, hierarchical allreduce will be a necessity for getting good scalability
> -it is a good idea to rebase with Horovod `master` if you haven't done so already, to take advantage of this new performance-improving feature
> -3 possible fixes:
> 1) add MPI_SUM for float16 and do gradient all-reduce in float16 (may be difficult to converge model)
> 2) hook into `update_multi_precision` after casting gradient to float32 and before weight update
> 3) hardcode `hvd.allreduce` here. Problems: Might not be possible for mxnet to import horovod?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org