You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (Jira)" <ji...@apache.org> on 2019/11/07 00:41:00 UTC
[jira] [Comment Edited] (MADLIB-1390) DL: helper function for
asymmetric cluster config
[ https://issues.apache.org/jira/browse/MADLIB-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968826#comment-16968826 ]
Frank McQuillan edited comment on MADLIB-1390 at 11/7/19 12:40 AM:
-------------------------------------------------------------------
tests onhttps://github.com/apache/madlib/pull/455:
{code}
SELECT * FROM madlib.gpu_configuration();
hostname | gpu_descr
----------+------------------------------------------------------------------------------------------
phoenix0 | device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0
phoenix0 | device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0
phoenix0 | device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0
phoenix0 | device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0
phoenix1 | device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0
phoenix1 | device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0
phoenix1 | device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0
phoenix1 | device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0
phoenix2 | device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0
phoenix2 | device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0
phoenix2 | device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0
phoenix2 | device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0
phoenix3 | device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0
phoenix3 | device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0
phoenix3 | device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0
phoenix3 | device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0
phoenix4 | device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0
phoenix4 | device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0
phoenix4 | device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0
phoenix4 | device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0
(20 rows)
{code}
OK
However the below does not work:
{code}
CREATE TABLE gpu_resources AS
SELECT * FROM madlib.gpu_configuration();
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause. Creating a NULL policy entry.
ERROR: plpy.SPIError: function cannot execute on segment because it issues a non-SELECT statement (entry db 10.138.0.41:5432 pid=16886)
DETAIL:
Traceback (most recent call last):
PL/Python function "gpu_configuration", line 21, in <module>
with AOControl(False) and MinWarning("error"):
PL/Python function "gpu_configuration", line 154, in __enter__
PL/Python function "gpu_configuration"
SQL function "gpu_configuration" statement 1
{code}
I think this is a problem, so I think we should create an output table instead of using a set returning function.
was (Author: fmcquillan):
{code}
SELECT * FROM madlib.gpu_configuration();
hostname | gpu_descr
----------+------------------------------------------------------------------------------------------
phoenix0 | device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0
phoenix0 | device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0
phoenix0 | device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0
phoenix0 | device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0
phoenix1 | device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0
phoenix1 | device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0
phoenix1 | device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0
phoenix1 | device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0
phoenix2 | device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0
phoenix2 | device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0
phoenix2 | device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0
phoenix2 | device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0
phoenix3 | device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0
phoenix3 | device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0
phoenix3 | device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0
phoenix3 | device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0
phoenix4 | device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0
phoenix4 | device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0
phoenix4 | device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0
phoenix4 | device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0
(20 rows)
{code}
OK
However the below does not work:
{code}
CREATE TABLE gpu_resources AS
SELECT * FROM madlib.gpu_configuration();
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause. Creating a NULL policy entry.
ERROR: plpy.SPIError: function cannot execute on segment because it issues a non-SELECT statement (entry db 10.138.0.41:5432 pid=16886)
DETAIL:
Traceback (most recent call last):
PL/Python function "gpu_configuration", line 21, in <module>
with AOControl(False) and MinWarning("error"):
PL/Python function "gpu_configuration", line 154, in __enter__
PL/Python function "gpu_configuration"
SQL function "gpu_configuration" statement 1
{code}
I think this is a problem, so I think we should create an output table instead of using a set returning function.
> DL: helper function for asymmetric cluster config
> -------------------------------------------------
>
> Key: MADLIB-1390
> URL: https://issues.apache.org/jira/browse/MADLIB-1390
> Project: Apache MADlib
> Issue Type: Improvement
> Components: Deep Learning
> Reporter: Nikhil Kak
> Priority: Major
> Fix For: v1.17
>
>
> h3. Helper function
> *Interface*
> gpu_configuration(source text) –- optional param. Can be one of 'tensorflow' and 'nvidia'. Default to 'tensorflow'
> SELECT * FROM madlib.gpu_configuration();
> SELECT * FROM madlib.gpu_configuration('tensorflow');
> SELECT * FROM madlib.gpu_configuration('nvidia');
>
> *GPDB*
> List the state of the cluster, however it is configured:
> SELECT * FROM madlib.gpu_configuration();
>
> gpu_descr | hostname
> ------------------+--------------------------
> NVIDIA Tesla P100 | host1
> NVIDIA Tesla P100 | host1
> Super Duper GPU | host2
> Super Duper GPU | host2
> (4 rows)
>
> Can return results in a slightly different format, depending on what info can be returned about the GPU. But there should be 1 row per GPU.
>
> Details:
> # Ignore mirror segments and master host
>
> *POSTGRES*
> select * from madlib.gpu_configuration();
> hostname | gpu_descr
> ----------+------------
> localhost | NVIDIA Tesla P100
> (1 row)
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)