You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by GitBox <gi...@apache.org> on 2019/11/23 01:11:11 UTC
[GitHub] [madlib] fmcquillan99 commented on issue #459: DL: Add support for
asymmetric segment distribution to preprocessor
fmcquillan99 commented on issue #459: DL: Add support for asymmetric segment distribution to preprocessor
URL: https://github.com/apache/madlib/pull/459#issuecomment-557749546
Please review tests (0) and (5) which I think may need some changes.
(0)
I think `gpu_config` in the output table should be changed to `distribution_rules`
to match the name of the input parameter.
(1)
CPUs only
```
DROP TABLE IF EXISTS image_data_packed, image_data_packed_summary;
SELECT madlib.training_preprocessor_dl('image_data', -- Source table
'image_data_packed', -- Output table
'species', -- Dependent variable
'rgb', -- Independent variable
NULL, -- Buffer size
255, -- Normalizing constant
NULL,
'all_segments'
);
SELECT * FROM image_data_packed_summary;
-[ RECORD 1 ]-------+------------------
source_table | image_data
output_table | image_data_packed
dependent_varname | species
independent_varname | rgb
dependent_vartype | text
class_values | {bird,cat,dog}
buffer_size | 26
normalizing_const | 255
num_classes | 3
gpu_config | all_segments
```
OK
(2)
table `xxx` exists but has wrong format for distribution table
```
DROP TABLE IF EXISTS image_data_packed, image_data_packed_summary;
SELECT madlib.training_preprocessor_dl('image_data', -- Source table
'image_data_packed', -- Output table
'species', -- Dependent variable
'rgb', -- Independent variable
NULL, -- Buffer size
255, -- Normalizing constant
NULL,
'xxx'
);
ERROR: plpy.Error: training_preprocessor_dl: segments_to_use table must contain dbib column (plpython.c:5038)
CONTEXT: Traceback (most recent call last):
PL/Python function "training_preprocessor_dl", line 24, in <module>
training_preprocessor_obj.training_preprocessor_dl()
PL/Python function "training_preprocessor_dl", line 558, in training_preprocessor_dl
PL/Python function "training_preprocessor_dl", line 271, in input_preprocessor_dl
PL/Python function "training_preprocessor_dl", line 96, in _assert
PL/Python function "training_preprocessor_dl"
```
OK
(3)
distribution table `yyy` does not exit
```
DROP TABLE IF EXISTS image_data_packed, image_data_packed_summary;
SELECT madlib.training_preprocessor_dl('image_data', -- Source table
'image_data_packed', -- Output table
'species', -- Dependent variable
'rgb', -- Independent variable
NULL, -- Buffer size
255, -- Normalizing constant
NULL,
'yyy'
);
ERROR: plpy.Error: training_preprocessor_dl error: Input table 'yyy' does not exist. (plpython.c:5038)
DETAIL: segments_to_use table (yyy) doesn't exist.
CONTEXT: Traceback (most recent call last):
PL/Python function "training_preprocessor_dl", line 24, in <module>
training_preprocessor_obj.training_preprocessor_dl()
PL/Python function "training_preprocessor_dl", line 558, in training_preprocessor_dl
PL/Python function "training_preprocessor_dl", line 269, in input_preprocessor_dl
PL/Python function "training_preprocessor_dl", line 674, in input_tbl_valid
PL/Python function "training_preprocessor_dl"
```
OK
(4)
Ask for GPUs but no GPUs on cluster
```
DROP TABLE IF EXISTS image_data_packed, image_data_packed_summary;
SELECT madlib.training_preprocessor_dl('image_data', -- Source table
'image_data_packed', -- Output table
'species', -- Dependent variable
'rgb', -- Independent variable
NULL, -- Buffer size
255, -- Normalizing constant
NULL,
'gpu_segments'
);
ERROR: plpy.Error: training_preprocessor_dl: No GPUs configured on hosts. (plpython.c:5038)
CONTEXT: Traceback (most recent call last):
PL/Python function "training_preprocessor_dl", line 24, in <module>
training_preprocessor_obj.training_preprocessor_dl()
PL/Python function "training_preprocessor_dl", line 558, in training_preprocessor_dl
PL/Python function "training_preprocessor_dl", line 243, in input_preprocessor_dl
PL/Python function "training_preprocessor_dl"
```
OK
(5)
Valid distribution table
```
DROP TABLE IF EXISTS segments_to_use;
CREATE TABLE segments_to_use AS
SELECT DISTINCT dbid, hostname FROM gp_segment_configuration
WHERE role='p' AND content>=0;
SELECT * FROM segments_to_use ORDER BY hostname, dbid;
dbid | hostname
------+-----------------------
2 | pm-demo-machine-keras
3 | pm-demo-machine-keras
(2 rows)
DROP TABLE IF EXISTS image_data_packed, image_data_packed_summary;
SELECT madlib.training_preprocessor_dl('image_data', -- Source table
'image_data_packed', -- Output table
'species', -- Dependent variable
'rgb', -- Independent variable
NULL, -- Buffer size
255, -- Normalizing constant
NULL,
'segments_to_use'
);
SELECT * FROM image_data_packed_summary;
-[ RECORD 1 ]-------+------------------
source_table | image_data
output_table | image_data_packed
dependent_varname | species
independent_varname | rgb
dependent_vartype | text
class_values | {bird,cat,dog}
buffer_size | 26
normalizing_const | 255
num_classes | 3
gpu_config | {0,1}
```
The field `gpu_config` says `{0,1}` which does not match `dbid` of {2,3}. What is the `{0,1}` from?
I think we should report out `dbid` or else the user might get confused.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services