You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/09/25 23:05:16 UTC
[jira] Created: (HIVE-857) Transform should produce objects in the
same type as specified at runtime
Transform should produce objects in the same type as specified at runtime
-------------------------------------------------------------------------
Key: HIVE-857
URL: https://issues.apache.org/jira/browse/HIVE-857
Project: Hadoop Hive
Issue Type: Bug
Affects Versions: 0.4.1, 0.5.0
Reporter: Zheng Shao
The following code fails at runtime, because the "clicked" field produced by "transform" is actually of type String at runtime, instead of boolean.
{code}
INSERT OVERWRITE TABLE feature_index
SELECT
temp.feature,
temp.ad_id,
sum(if(temp.clicked, 1, 0)) / cast(count(temp.clicked) as DOUBLE) as clicked_percent
FROM (
SELECT concat('ua:', trim(lower(ua.feature))) as feature, ua.ad_id, ua.clicked
FROM (
MAP raw_logs.user_agent, raw_logs.ad_id, raw_logs.clicked
USING 'my.py' as (feature STRING, ad_id STRING, clicked BOOLEAN)
FROM raw_logs
) ua
) temp
GROUP BY temp.feature, temp.ad_id;
{code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-857) Transform should produce objects in the
same type as specified at runtime
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain resolved HIVE-857.
-----------------------------
Resolution: Fixed
Hadoop Flags: [Reviewed]
Committed, Thanks Zheng
> Transform should produce objects in the same type as specified at runtime
> -------------------------------------------------------------------------
>
> Key: HIVE-857
> URL: https://issues.apache.org/jira/browse/HIVE-857
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.4.0, 0.4.1
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Attachments: HIVE-857.1.patch
>
>
> The following code fails at runtime, because the "clicked" field produced by "transform" is actually of type String at runtime, instead of boolean.
> {code}
> INSERT OVERWRITE TABLE feature_index
> SELECT
> temp.feature,
> temp.ad_id,
> sum(if(temp.clicked, 1, 0)) / cast(count(temp.clicked) as DOUBLE) as clicked_percent
> FROM (
> SELECT concat('ua:', trim(lower(ua.feature))) as feature, ua.ad_id, ua.clicked
> FROM (
> MAP raw_logs.user_agent, raw_logs.ad_id, raw_logs.clicked
> USING 'my.py' as (feature STRING, ad_id STRING, clicked BOOLEAN)
> FROM raw_logs
> ) ua
> ) temp
> GROUP BY temp.feature, temp.ad_id;
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-857) Transform should produce objects in the
same type as specified at runtime
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao reassigned HIVE-857:
-------------------------------
Assignee: Zheng Shao
> Transform should produce objects in the same type as specified at runtime
> -------------------------------------------------------------------------
>
> Key: HIVE-857
> URL: https://issues.apache.org/jira/browse/HIVE-857
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.4.0, 0.4.1
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Attachments: HIVE-857.1.patch
>
>
> The following code fails at runtime, because the "clicked" field produced by "transform" is actually of type String at runtime, instead of boolean.
> {code}
> INSERT OVERWRITE TABLE feature_index
> SELECT
> temp.feature,
> temp.ad_id,
> sum(if(temp.clicked, 1, 0)) / cast(count(temp.clicked) as DOUBLE) as clicked_percent
> FROM (
> SELECT concat('ua:', trim(lower(ua.feature))) as feature, ua.ad_id, ua.clicked
> FROM (
> MAP raw_logs.user_agent, raw_logs.ad_id, raw_logs.clicked
> USING 'my.py' as (feature STRING, ad_id STRING, clicked BOOLEAN)
> FROM raw_logs
> ) ua
> ) temp
> GROUP BY temp.feature, temp.ad_id;
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-857) Transform should produce objects in
the same type as specified at runtime
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759774#action_12759774 ]
Zheng Shao commented on HIVE-857:
---------------------------------
This only affects 0.4.
The transaction that added the new syntax is r803371, HIVE-743. Let user specify serde for custom sctipts. (Namit Jain via rmurthy)
The transaction that fixed the issue is r807330, HIVE-708. Add TypedBytes SerDe for transform. (Namit Jain via zshao).
We may either pull in the whole of HIVE-708 to branch 0.4, or split HIVE-708's transaction.
> Transform should produce objects in the same type as specified at runtime
> -------------------------------------------------------------------------
>
> Key: HIVE-857
> URL: https://issues.apache.org/jira/browse/HIVE-857
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.4.0, 0.4.1
> Reporter: Zheng Shao
>
> The following code fails at runtime, because the "clicked" field produced by "transform" is actually of type String at runtime, instead of boolean.
> {code}
> INSERT OVERWRITE TABLE feature_index
> SELECT
> temp.feature,
> temp.ad_id,
> sum(if(temp.clicked, 1, 0)) / cast(count(temp.clicked) as DOUBLE) as clicked_percent
> FROM (
> SELECT concat('ua:', trim(lower(ua.feature))) as feature, ua.ad_id, ua.clicked
> FROM (
> MAP raw_logs.user_agent, raw_logs.ad_id, raw_logs.clicked
> USING 'my.py' as (feature STRING, ad_id STRING, clicked BOOLEAN)
> FROM raw_logs
> ) ua
> ) temp
> GROUP BY temp.feature, temp.ad_id;
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-857) Transform should produce objects in the
same type as specified at runtime
Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Steinbach updated HIVE-857:
--------------------------------
Fix Version/s: 0.4.1
Affects Version/s: (was: 0.4.1)
Component/s: Query Processor
> Transform should produce objects in the same type as specified at runtime
> -------------------------------------------------------------------------
>
> Key: HIVE-857
> URL: https://issues.apache.org/jira/browse/HIVE-857
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.4.0
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Fix For: 0.4.1
>
> Attachments: HIVE-857.1.patch
>
>
> The following code fails at runtime, because the "clicked" field produced by "transform" is actually of type String at runtime, instead of boolean.
> {code}
> INSERT OVERWRITE TABLE feature_index
> SELECT
> temp.feature,
> temp.ad_id,
> sum(if(temp.clicked, 1, 0)) / cast(count(temp.clicked) as DOUBLE) as clicked_percent
> FROM (
> SELECT concat('ua:', trim(lower(ua.feature))) as feature, ua.ad_id, ua.clicked
> FROM (
> MAP raw_logs.user_agent, raw_logs.ad_id, raw_logs.clicked
> USING 'my.py' as (feature STRING, ad_id STRING, clicked BOOLEAN)
> FROM raw_logs
> ) ua
> ) temp
> GROUP BY temp.feature, temp.ad_id;
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-857) Transform should produce objects in the
same type as specified at runtime
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-857:
----------------------------
Attachment: HIVE-857.1.patch
This patch is mainly a subset of HIVE-708, which does NOT include TypedBytes. TypedBytes is a new feature - we do not need to propagate it back to branch-0.4.
> Transform should produce objects in the same type as specified at runtime
> -------------------------------------------------------------------------
>
> Key: HIVE-857
> URL: https://issues.apache.org/jira/browse/HIVE-857
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.4.0, 0.4.1
> Reporter: Zheng Shao
> Attachments: HIVE-857.1.patch
>
>
> The following code fails at runtime, because the "clicked" field produced by "transform" is actually of type String at runtime, instead of boolean.
> {code}
> INSERT OVERWRITE TABLE feature_index
> SELECT
> temp.feature,
> temp.ad_id,
> sum(if(temp.clicked, 1, 0)) / cast(count(temp.clicked) as DOUBLE) as clicked_percent
> FROM (
> SELECT concat('ua:', trim(lower(ua.feature))) as feature, ua.ad_id, ua.clicked
> FROM (
> MAP raw_logs.user_agent, raw_logs.ad_id, raw_logs.clicked
> USING 'my.py' as (feature STRING, ad_id STRING, clicked BOOLEAN)
> FROM raw_logs
> ) ua
> ) temp
> GROUP BY temp.feature, temp.ad_id;
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-857) Transform should produce objects in the
same type as specified at runtime
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-857:
----------------------------
Affects Version/s: (was: 0.5.0)
0.4.0
> Transform should produce objects in the same type as specified at runtime
> -------------------------------------------------------------------------
>
> Key: HIVE-857
> URL: https://issues.apache.org/jira/browse/HIVE-857
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.4.0, 0.4.1
> Reporter: Zheng Shao
>
> The following code fails at runtime, because the "clicked" field produced by "transform" is actually of type String at runtime, instead of boolean.
> {code}
> INSERT OVERWRITE TABLE feature_index
> SELECT
> temp.feature,
> temp.ad_id,
> sum(if(temp.clicked, 1, 0)) / cast(count(temp.clicked) as DOUBLE) as clicked_percent
> FROM (
> SELECT concat('ua:', trim(lower(ua.feature))) as feature, ua.ad_id, ua.clicked
> FROM (
> MAP raw_logs.user_agent, raw_logs.ad_id, raw_logs.clicked
> USING 'my.py' as (feature STRING, ad_id STRING, clicked BOOLEAN)
> FROM raw_logs
> ) ua
> ) temp
> GROUP BY temp.feature, temp.ad_id;
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-857) Transform should produce objects in
the same type as specified at runtime
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759744#action_12759744 ]
Zheng Shao commented on HIVE-857:
---------------------------------
Explain and stack traces. Thanks for Andrew Hitchcock for reporting this:
{code}
EXPLAIN INSERT OVERWRITE TABLE feature_index
SELECT
temp.feature,
temp.ad_id,
sum(if(temp.clicked, 1, 0)) / cast(count(temp.clicked) as DOUBLE) as clicked_percent
FROM (
SELECT concat('ua:', trim(lower(ua.feature))) as feature, ua.ad_id, ua.clicked
FROM (
MAP raw_logs.user_agent, raw_logs.ad_id, raw_logs.clicked
USING 's3://anhi-test-programs/hive/split_user_agent.py' as (feature STRING, ad_id STRING, clicked BOOLEAN)
FROM raw_logs
) ua
) temp
GROUP BY temp.feature, temp.ad_id;
OK
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_QUERY (TOK_FROM (TOK_TABREF raw_logs)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TRANSFORM (TOK_EXPLIST (. (TOK_TABLE_OR_COL raw_logs) user_agent) (. (TOK_TABLE_OR_COL raw_logs) ad_id) (. (TOK_TABLE_OR_COL raw_logs) clicked)) 's3://anhi-test-programs/hive/split_user_agent.py' (TOK_TABCOLLIST (TOK_TABCOL feature TOK_STRING) (TOK_TABCOL ad_id TOK_STRING) (TOK_TABCOL clicked TOK_BOOLEAN))))))) ua)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_FUNCTION concat 'ua:' (TOK_FUNCTION trim (TOK_FUNCTION lower (. (TOK_TABLE_OR_COL ua) feature)))) feature) (TOK_SELEXPR (. (TOK_TABLE_OR_COL ua) ad_id)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL ua) clicked))))) temp)) (TOK_INSERT (TOK_DESTINATION (TOK_TAB feature_index)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL temp) feature)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL temp) ad_id)) (TOK_SELEXPR (/ (TOK_FUNCTION sum (TOK_FUNCTION if (. (TOK_TABLE_OR_COL temp) clicked) 1 0)) (TOK_FUNCTION TOK_DOUBLE (TOK_FUNCTION count (. (TOK_TABLE_OR_COL temp) clicked)))) clicked_percent)) (TOK_GROUPBY (. (TOK_TABLE_OR_COL temp) feature) (. (TOK_TABLE_OR_COL temp) ad_id))))
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
temp:ua:raw_logs
TableScan
alias: raw_logs
Select Operator
expressions:
expr: user_agent
type: string
expr: ad_id
type: string
expr: clicked
type: boolean
outputColumnNames: _col0, _col1, _col2
Transform Operator
command: s3://anhi-test-programs/hive/split_user_agent.py
output info:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Select Operator
expressions:
expr: concat('ua:', trim(lower(feature)))
type: string
expr: ad_id
type: string
expr: clicked
type: boolean
outputColumnNames: _col0, _col1, _col2
Select Operator
expressions:
expr: _col0
type: string
expr: _col1
type: string
expr: _col2
type: boolean
outputColumnNames: _col0, _col1, _col2
Group By Operator
aggregations:
expr: sum(if(_col2, 1, 0))
expr: count(_col2)
keys:
expr: _col0
type: string
expr: _col1
type: string
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3
Reduce Output Operator
key expressions:
expr: _col0
type: string
expr: _col1
type: string
sort order: ++
Map-reduce partition columns:
expr: _col0
type: string
expr: _col1
type: string
tag: -1
value expressions:
expr: _col2
type: bigint
expr: _col3
type: bigint
Reduce Operator Tree:
Group By Operator
aggregations:
expr: sum(VALUE._col0)
expr: count(VALUE._col1)
keys:
expr: KEY._col0
type: string
expr: KEY._col1
type: string
mode: mergepartial
outputColumnNames: _col0, _col1, _col2, _col3
Select Operator
expressions:
expr: _col0
type: string
expr: _col1
type: string
expr: (UDFToDouble(_col2) / UDFToDouble(_col3))
type: double
outputColumnNames: _col0, _col1, _col2
File Output Operator
compressed: false
GlobalTableId: 1
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: feature_index
Stage: Stage-0
Move Operator
tables:
replace: true
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: feature_index
And the Hadoop stack trace:
java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:110)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:33)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:224)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2204)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot initialize ScriptOperator
at org.apache.hadoop.hive.ql.exec.ScriptOperator.initializeOp(ScriptOperator.java:266)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:295)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:332)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:317)
at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:58)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:295)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:332)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:317)
at org.apache.hadoop.hive.ql.exec.Operator.initializeOp(Operator.java:303)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:295)
at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:301)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:295)
at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:82)
... 7 more
Caused by: org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: The first argument of function IF should be "boolean", but "string"
is found
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFIf.initialize(GenericUDFIf.java:58)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:77)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:179)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:295)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:332)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:317)
at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:58)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:295)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:332)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:317)
at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:58)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:295)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:332)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:317)
at org.apache.hadoop.hive.ql.exec.ScriptOperator.initializeOp(ScriptOperator.java:262)
... 19 more
{code}
> Transform should produce objects in the same type as specified at runtime
> -------------------------------------------------------------------------
>
> Key: HIVE-857
> URL: https://issues.apache.org/jira/browse/HIVE-857
> Project: Hadoop Hive
> Issue Type: Bug
> Affects Versions: 0.4.1, 0.5.0
> Reporter: Zheng Shao
>
> The following code fails at runtime, because the "clicked" field produced by "transform" is actually of type String at runtime, instead of boolean.
> {code}
> INSERT OVERWRITE TABLE feature_index
> SELECT
> temp.feature,
> temp.ad_id,
> sum(if(temp.clicked, 1, 0)) / cast(count(temp.clicked) as DOUBLE) as clicked_percent
> FROM (
> SELECT concat('ua:', trim(lower(ua.feature))) as feature, ua.ad_id, ua.clicked
> FROM (
> MAP raw_logs.user_agent, raw_logs.ad_id, raw_logs.clicked
> USING 'my.py' as (feature STRING, ad_id STRING, clicked BOOLEAN)
> FROM raw_logs
> ) ua
> ) temp
> GROUP BY temp.feature, temp.ad_id;
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.