You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Domino Valdano (JIRA)" <ji...@apache.org> on 2019/03/20 22:57:00 UTC

[jira] [Comment Edited] (MADLIB-1308) Change GPU related hardcoded things in madlib_keras.py_in

    [ https://issues.apache.org/jira/browse/MADLIB-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16797632#comment-16797632 ] 

Domino Valdano edited comment on MADLIB-1308 at 3/20/19 10:56 PM:
------------------------------------------------------------------

 

1.

a.)  Yes, we should rename this to segments_per_host, and detect it automatically.  Also, because you can have a different number of segments on each host, we either need to have the transition function do this detection separately on each segment (so that it detects the number of segments on that host), or change it into an array.

 b.)  No, it should never be anything but gpu0 or cpu0.  The other gpu's are there, but we intentionally hide all but one from each segment.  Every segment must use either exactly 1 gpu, or the cpu's (and cpu's only show up as cpu0 no matter how many there are–this was only found through testing though, maybe we should confirm this in the docs to be sure).  The reason for this decision was that we tested it with more than 1 gpu per segment and found that the performance was nearly identical to a single gpu–so almost 1 full gpu is just wasted.  And allowing more than 1 segment to share a gpu.  Therefore, we require that there be at most 1 gpu per segment.  Any other gpus will be ignored.

2.

  This is the logic Omer and I came up with for assigning each segment to a unique gpu.  The gpu number the segment gets is just its segment id modulo the number of segments (gpus) per host.  In other words, if there are 4 segments on each host, then they will get gpu 0, 1, 2, and 3 respectively.  (To each, their own gpu will appear as gpu0.)  This logic will work only if there are the same number of segments on each host, and as long as there is at least 1 gpu per segment.  They each get assigned their own gpu, and any extra gpus are ignored.  The formula will have to be modified if there are a different number of segments on each host.  I think this means we have to pass around an array holding the number of segments on each host... or at the very least, the segment id of the first segment on each host.  Hopefully this is something that can just be queried from a system table.  For postgres, the current logic should work fine as long as we set gp_segment_id = 1 and segments_per_host = 1.  But with postgres, it doesn't matter anyway whether we hide gpu's beyond gpu0 or not–so even easier would be to just skip calling the device detection function entirely.

3. 

a.)  For gpus < segments, I think the best behavior would for the segments that don't have a gpu to fall back on using cpu.  I think it might work as-is, but we should test it.  If there is a required change, I suspect it will be minimal.  The real question is, will anyone want to run like this, knowing that the slowest segment is going to be the one that determines the runtime?  (ie, they probably won't get performance that's any better than if they ran with all cpus.)

b.) This should work fine as-is, no changes required.  Extra gpu's are ignored.


was (Author: dvaldano):
 

1.

a.)  Yes, we should rename this to segments_per_host, and detect it automatically.  Also, because you can have a different number of segments on each host, we either need to have the transition function do this detection separately on each segment (so that it detects the number of segments on that host), or change it into an array.

 b.)  No, it should never be anything but gpu0 or cpu0.  The other gpu's are there, but we intentionally hide all but one from each segment.  Every segment must use either exactly 1 gpu, or the cpu's (and cpu's only show up as cpu0 no matter how many there are–this was only found through testing though, maybe we should confirm this in the docs to be sure).  The reason for this decision was that we tested it with more than 1 gpu per segment and found that the performance was nearly identical to a single gpu–so almost 1 full gpu is just wasted.  And allowing more than 1 segment to share a gpu.  Therefore, we require that there be at most 1 gpu per segment.  Any other gpus will be ignored.

2.

  This is the logic Omer and I came up with for assigning each segment to a unique gpu.  The gpu number the segment gets is just its segment id modulo the number of segments (gpus) per host.  In other words, if there are 4 segments on each host, then each of the 4 segments will get gpu 0, 1, 2, and 3.  This logic will work only if there are the same number of segments on each host, and as long as there is at least 1 gpu per segment.  They each get assigned their own gpu, and any extra gpus are ignored.  The formula will have to be modified if there are a different number of segments on each host.  I think this means we have to pass around an array holding the number of segments on each host... or at the very least, the segment id of the first segment on each host.  Hopefully this is something that can just be queried from a system table.  For postgres, the current logic should work fine as long as we set gp_segment_id = 1 and segments_per_host = 1.  But with postgres, it doesn't matter anyway whether we hide gpu's beyond gpu0 or not–so even easier would be to just skip calling the device detection function entirely.

3. 

a.)  For gpus < segments, I think the best behavior would for the segments that don't have a gpu to fall back on using cpu.  I think it might work as-is, but we should test it.  If there is a required change, I suspect it will be minimal.  The real question is, will anyone want to run like this, knowing that the slowest segment is going to be the one that determines the runtime?  (ie, they probably won't get performance that's any better than if they ran with all cpus.)

b.) This should work fine as-is, no changes required.  Extra gpu's are ignored.

> Change GPU related hardcoded things in madlib_keras.py_in
> ---------------------------------------------------------
>
>                 Key: MADLIB-1308
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1308
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Deep Learning
>            Reporter: Nandish Jayaram
>            Priority: Major
>             Fix For: v1.16
>
>
> Based on the code in PR [https://github.com/apache/madlib/pull/355:]
>  # Currently in madlib_keras.py_in , we hardcod the following things
>  ## gpus_per_host = 4
>  ## device_name = '/cpu:0' or '/gpu:0' ( can the device ever be not named gpu0 or cpu0 ? )
>  # Look into and document the usage of {{CUDA_VISIBLE_DEVICES}} when gpu_only is set to TRUE. Currently we set it to str(current_seg_id % gpus_per_host). How does this logic work and will it always work? How would this logic change for Postgres, since it has no segment_id.
>  # What happens if
> no of gpus < no of segments
> no of gpus > no of segments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)