You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2008/11/01 00:23:44 UTC
[jira] Commented: (HADOOP-4569) Hive: new syntax for specifying
custom map/reduce scripts
[ https://issues.apache.org/jira/browse/HADOOP-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644461#action_12644461 ]
Zheng Shao commented on HADOOP-4569:
------------------------------------
The old syntax for doing that was:
FROM (
FROM pv_users
SELECT TRANSFORM(pv_users.userid, pv_users.date)
AS(key, value)
USING 'map_script'
CLUSTER BY key ) map_output
INSERT OVERWRITE TABLE pv_users_reduced
SELECT TRANSFORM(map_output.key, map_output.value)
AS (date, count)
USING 'reduce_script';
We plan to change that to:
FROM (
FROM pv_users
MAP pv_users.userid, pv_users.date
USING 'map_script'
AS key, value
CLUSTER BY key
) map_output
INSERT OVERWRITE TABLE pv_users_reduced
REDUCE map_output.key, map_output.value
USING 'reduce_script'
AS date, count;
The script is expected to read tab-separated fields, and also generate tab-separated fields.
The major changes are:
• Schemaless Mapper/Reducer: if there is "AS" we assume "AS key,value" which takes the bytes before the first tab into key, and the rest to value.
• SELECT TRANSFORM changed to MAP/REDUCE to make it clear what is map and what is reduce.
• Reordered USING and AS to make it clearer.
* Support different shuffling/sorting keys by using "DISTRIBUTE BY" and "SORT BY" ("CLUSTER BY key" means "DISTRIBUTE BY key SORT BY key ASC")
> Hive: new syntax for specifying custom map/reduce scripts
> ---------------------------------------------------------
>
> Key: HADOOP-4569
> URL: https://issues.apache.org/jira/browse/HADOOP-4569
> Project: Hadoop Core
> Issue Type: Improvement
> Reporter: Zheng Shao
>
> In Hive we not only supports SQL but also want to support custom scripts.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.