You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by "Anton Dmitriev (JIRA)" <ji...@apache.org> on 2018/11/02 15:20:01 UTC
[jira] [Created] (IGNITE-10133) ML: Switch to per-node TensorFlow
worker strategy
Anton Dmitriev created IGNITE-10133:
---------------------------------------
Summary: ML: Switch to per-node TensorFlow worker strategy
Key: IGNITE-10133
URL: https://issues.apache.org/jira/browse/IGNITE-10133
Project: Ignite
Issue Type: Improvement
Components: ml
Affects Versions: 2.8
Reporter: Anton Dmitriev
Assignee: Anton Dmitriev
Fix For: 2.8
Currently we start TensorFlow worker process per every cache partition. In case node is equipped by GPU and TensorFlow uses this GPU it acquires all GPU memory. If two worker processes try to acquire all GPU memory they will fail.
To eliminate this problem and allow users utilizing GPU during the training we need to switch to per-node strategy. It means we need to start one TensorFlow worker process per node, not per partition.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)