You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2010/08/30 20:50:57 UTC
[jira] Created: (PIG-1580) new syntax for native mapreduce operator
new syntax for native mapreduce operator
----------------------------------------
Key: PIG-1580
URL: https://issues.apache.org/jira/browse/PIG-1580
Project: Pig
Issue Type: Task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
mapreduce operator (PIG-506) and stream operator have some similarities. It makes sense to use a similar syntax for both.
Alan has proposed the following syntax for mapreduce operator, and that we move stream operator also to similar a syntax in a future release.
MAPREDUCE id jar
INPUT 'path' USING LoadFunc
OUTPUT 'path' USING StoreFunc
[SHIP 'path' [, 'path' ...]]
[CACHE 'dfs_path#dfs_file' [, 'dfs_path#dfs_file' ...]]
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1580) new syntax for native mapreduce operator
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thejas M Nair updated PIG-1580:
-------------------------------
Fix Version/s: 0.8.0
> new syntax for native mapreduce operator
> ----------------------------------------
>
> Key: PIG-1580
> URL: https://issues.apache.org/jira/browse/PIG-1580
> Project: Pig
> Issue Type: Task
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.8.0
>
>
> mapreduce operator (PIG-506) and stream operator have some similarities. It makes sense to use a similar syntax for both.
> Alan has proposed the following syntax for mapreduce operator, and that we move stream operator also to similar a syntax in a future release.
> MAPREDUCE id jar
> INPUT 'path' USING LoadFunc
> OUTPUT 'path' USING StoreFunc
> [SHIP 'path' [, 'path' ...]]
> [CACHE 'dfs_path#dfs_file' [, 'dfs_path#dfs_file' ...]]
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1580) new syntax for native mapreduce
operator
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904298#action_12904298 ]
Thejas M Nair commented on PIG-1580:
------------------------------------
Updating syntax to include support for parameters -
MAPREDUCE id jar 'params'
INPUT 'path' USING LoadFunc
OUTPUT 'path' USING StoreFunc
[SHIP 'path' [, 'path' ...]]
[CACHE 'dfs_path#dfs_file' , 'dfs_path#dfs_file' ...]
> new syntax for native mapreduce operator
> ----------------------------------------
>
> Key: PIG-1580
> URL: https://issues.apache.org/jira/browse/PIG-1580
> Project: Pig
> Issue Type: Task
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.8.0
>
>
> mapreduce operator (PIG-506) and stream operator have some similarities. It makes sense to use a similar syntax for both.
> Alan has proposed the following syntax for mapreduce operator, and that we move stream operator also to similar a syntax in a future release.
> MAPREDUCE id jar
> INPUT 'path' USING LoadFunc
> OUTPUT 'path' USING StoreFunc
> [SHIP 'path' [, 'path' ...]]
> [CACHE 'dfs_path#dfs_file' [, 'dfs_path#dfs_file' ...]]
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1580) new syntax for native mapreduce
operator
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thejas M Nair resolved PIG-1580.
--------------------------------
Resolution: Won't Fix
In case of 'hadoop jar' command, the files to ship to distributed cache are specified using -files command line option. Since typical users would be moving an existing map-reduce job that they were running using 'hadoop jar', it is easier for them to copy the existing command line options rather than the SHIP/CACHE clause in the proposed syntax.
If we don't have the SHIP/CACHE clauses in mapreduce operator, there is very little similarity between streaming and mapreduce operator. It will be better to use LOAD/STORE instead of INPUT/OUTPUT in the syntax of mapreduce, as they specify the load/store functions and not the streaming deserializer/serializer.
So I think it is better to go back to the old syntax. Resolving jira as won't-fix.
> new syntax for native mapreduce operator
> ----------------------------------------
>
> Key: PIG-1580
> URL: https://issues.apache.org/jira/browse/PIG-1580
> Project: Pig
> Issue Type: Task
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.8.0
>
>
> mapreduce operator (PIG-506) and stream operator have some similarities. It makes sense to use a similar syntax for both.
> Alan has proposed the following syntax for mapreduce operator, and that we move stream operator also to similar a syntax in a future release.
> MAPREDUCE id jar
> INPUT 'path' USING LoadFunc
> OUTPUT 'path' USING StoreFunc
> [SHIP 'path' [, 'path' ...]]
> [CACHE 'dfs_path#dfs_file' [, 'dfs_path#dfs_file' ...]]
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.