You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by "Lin Yuan (JIRA)" <ji...@apache.org> on 2018/10/17 17:14:00 UTC

[jira] [Updated] (MXNET-1112) float16 HIERARCHICAL_ALLREDUCE not working

     [ https://issues.apache.org/jira/browse/MXNET-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lin Yuan updated MXNET-1112:
----------------------------
    Description: 
  -Error message: *** An error occurred in MPI_Allreduce: the reduction operation MPI_SUM is not defined for non-intrinsic datatypes
  -suspected reason: Horovod MPI using CPU (i.e. non-CUDA aware) does not support float16 MPI_SUM ops
  -We are running into an issue with `update_multi_precision` in https://github.com/ctcyang/horovod/blob/0a0240113fe5a24ec2c772fd7309840ba179562a/horovod/mxnet/__init__.py#L47 We don't yet have a way of hooking into SGD's `update_multi_precision` to do the `hvd.allreduce` before weight update and after it is casted to float32. The way it is written now, the `hvd.allreduce` is all-reducing in `float16`, which does not presently support hierarchical allreduce
  -this is an issue, because our scalability experiments for 256 GPUs in float32 mode show 68% scalability with HOROVOD_HIERARCHICAL_ALLREDUCE=0, and 92.1% scalability with HOROVOD_HIERARCHICAL_ALLREDUCE=1. If this analogy holds for float16, hierarchical allreduce will be a necessity for getting good scalability
  -it is a good idea to rebase with Horovod `master` if you haven't done so already, to take advantage of this new performance-improving feature
  -3 possible fixes:
    1) add MPI_SUM for float16 and do gradient all-reduce in float16 (may be difficult to converge model)
    2) hook into `update_multi_precision` after casting gradient to float32 and before weight update
    3) hardcode `hvd.allreduce` here. Problems: Might not be possible for mxnet to import horovod?

> float16 HIERARCHICAL_ALLREDUCE not working
> ------------------------------------------
>
>                 Key: MXNET-1112
>                 URL: https://issues.apache.org/jira/browse/MXNET-1112
>             Project: Apache MXNet
>          Issue Type: Improvement
>          Components: Apache MXNet Backend
>            Reporter: Lin Yuan
>            Priority: Major
>
>   -Error message: *** An error occurred in MPI_Allreduce: the reduction operation MPI_SUM is not defined for non-intrinsic datatypes
>   -suspected reason: Horovod MPI using CPU (i.e. non-CUDA aware) does not support float16 MPI_SUM ops
>   -We are running into an issue with `update_multi_precision` in https://github.com/ctcyang/horovod/blob/0a0240113fe5a24ec2c772fd7309840ba179562a/horovod/mxnet/__init__.py#L47 We don't yet have a way of hooking into SGD's `update_multi_precision` to do the `hvd.allreduce` before weight update and after it is casted to float32. The way it is written now, the `hvd.allreduce` is all-reducing in `float16`, which does not presently support hierarchical allreduce
>   -this is an issue, because our scalability experiments for 256 GPUs in float32 mode show 68% scalability with HOROVOD_HIERARCHICAL_ALLREDUCE=0, and 92.1% scalability with HOROVOD_HIERARCHICAL_ALLREDUCE=1. If this analogy holds for float16, hierarchical allreduce will be a necessity for getting good scalability
>   -it is a good idea to rebase with Horovod `master` if you haven't done so already, to take advantage of this new performance-improving feature
>   -3 possible fixes:
>     1) add MPI_SUM for float16 and do gradient all-reduce in float16 (may be difficult to converge model)
>     2) hook into `update_multi_precision` after casting gradient to float32 and before weight update
>     3) hardcode `hvd.allreduce` here. Problems: Might not be possible for mxnet to import horovod?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org