You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Jingyi Mei (JIRA)" <ji...@apache.org> on 2018/05/25 01:14:01 UTC
[jira] [Comment Edited] (MADLIB-1237) Mini-batch preprocessor fails
for dt_golf dataset
[ https://issues.apache.org/jira/browse/MADLIB-1237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490072#comment-16490072 ]
Jingyi Mei edited comment on MADLIB-1237 at 5/25/18 1:13 AM:
-------------------------------------------------------------
We have this special character issue when:
1. we do a select and have a where clause like
{code:java}
WHERE column_name = 'something'with*special$character'{code}
And 2.when we try to create a table with column likeĀ
{code:java}
CREATE TABLE example_table AS
SELECT
'{ele'with*special_char, 'M,M', 'M$M'}'::text[] AS class_values{code}
Or
{code:java}
CREATE TABLE example_table AS
SELECT
ARRAY['ele'with*special_char', 'M,M', 'M$M'] AS class_values{code}
We need to handle all situations and make special character and also unicode work.
was (Author: jingyimei):
We have this special character issue when:
1. we do a select and have a where clause like
{code:java}
WHERE column_name = 'something'with*special$character'{code}
And 2.when we try to create a table with column likeĀ
{code:java}
CREATE TABLE example_table AS
SELECT
'{ele'with*special_char, 'M,M', 'M$M'}'::text[] AS class_values{code}
We need to handle both situations and make special character and also unicode work.
> Mini-batch preprocessor fails for dt_golf dataset
> --------------------------------------------------
>
> Key: MADLIB-1237
> URL: https://issues.apache.org/jira/browse/MADLIB-1237
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Utilities
> Reporter: Frank McQuillan
> Priority: Major
> Fix For: v1.15
>
>
> For the dt_golf data set from
> http://madlib.apache.org/docs/latest/group__grp__decision__tree.html#examples
> minibatch pre-processor fails
> {code}
> SELECT madlib.minibatch_preprocessor('dt_golf',
> 'dt_golf_packed_2',
> 'class',
> '"Temp_Humidity"', NULL ,1, True);
> ERROR: spiexceptions.SyntaxError: syntax error at or near "t"
> LINE 8: ...T madlib.array_contains_null(ARRAY[(class) = 'Don't Play', (...
> ^
> QUERY:
> SELECT SUM(source_table_row_count_by_group) AS source_table_row_count,
> SUM(num_rows_processed_by_group) AS total_num_rows_processed,
> AVG(num_rows_processed_by_group) AS avg_num_rows_processed
> FROM (
> SELECT COUNT(*) AS source_table_row_count_by_group,
> SUM(CASE
> WHEN NOT madlib.array_contains_null(ARRAY[(class) = 'Don't Play', (class) = 'Play']::INTEGER[]) AND
> NOT madlib.array_contains_null(("Temp_Humidity")::DOUBLE PRECISION[])
> THEN 1
> ELSE 0
> END) AS num_rows_processed_by_group
> FROM dt_golf
> ) AS s
> CONTEXT: Traceback (most recent call last):
> PL/Python function "minibatch_preprocessor", line 24, in <module>
> minibatch_preprocessor_obj.minibatch_preprocessor()
> PL/Python function "minibatch_preprocessor", line 45, in wrapper
> PL/Python function "minibatch_preprocessor", line 104, in minibatch_preprocessor
> PL/Python function "minibatch_preprocessor", line 236, in _get_skipped_rows_processed_count
> PL/Python function "minibatch_preprocessor"
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)