You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Pi Song (JIRA)" <ji...@apache.org> on 2008/06/06 15:31:45 UTC

[jira] Created: (PIG-257) Allow usage of custom Hadoop InputFormat in Pig

Allow usage of custom Hadoop InputFormat in Pig
-----------------------------------------------

                 Key: PIG-257
                 URL: https://issues.apache.org/jira/browse/PIG-257
             Project: Pig
          Issue Type: New Feature
            Reporter: Pi Song


This very cool idea sprang out from a discussion in mailing-list (Thanks Manish Shah)

There is a semantic issue that Hadoop Input Format generally expects K,V but Pig expects Tuple. We can solve this by sticking K,V as fields in Tuple. 

Provided that we've got rich built-in string/binary manipulation functions, Hadoop users shouldn't find it too costly to use Pig. This should definitely help accelerate Pig adoption process.

After a brief look at the current code, this new feature will require changes in Map Reduce execution engine so I will wait until the type branch is complete before start working on this (If nobody expresses interest in doing it :) ) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-257) Allow usage of custom Hadoop InputFormat in Pig

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12604342#action_12604342 ] 

Olga Natkovich commented on PIG-257:
------------------------------------

Pi, how would InputFormat be specified?

> Allow usage of custom Hadoop InputFormat in Pig
> -----------------------------------------------
>
>                 Key: PIG-257
>                 URL: https://issues.apache.org/jira/browse/PIG-257
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Pi Song
>
> This very cool idea sprang out from a discussion in mailing-list (Thanks Manish Shah)
> There is a semantic issue that Hadoop Input Format generally expects K,V but Pig expects Tuple. We can solve this by sticking K,V as fields in Tuple. 
> Provided that we've got rich built-in string/binary manipulation functions, Hadoop users shouldn't find it too costly to use Pig. This should definitely help accelerate Pig adoption process.
> After a brief look at the current code, this new feature will require changes in Map Reduce execution engine so I will wait until the type branch is complete before start working on this (If nobody expresses interest in doing it :) ) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-257) Allow usage of custom Hadoop InputFormat in Pig

Posted by "Pi Song (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12604815#action_12604815 ] 

Pi Song commented on PIG-257:
-----------------------------

We will need an implementation of LoadFunc that takes InputFormat name as a parameter. I haven't looked in detail yet actually.

> Allow usage of custom Hadoop InputFormat in Pig
> -----------------------------------------------
>
>                 Key: PIG-257
>                 URL: https://issues.apache.org/jira/browse/PIG-257
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Pi Song
>
> This very cool idea sprang out from a discussion in mailing-list (Thanks Manish Shah)
> There is a semantic issue that Hadoop Input Format generally expects K,V but Pig expects Tuple. We can solve this by sticking K,V as fields in Tuple. 
> Provided that we've got rich built-in string/binary manipulation functions, Hadoop users shouldn't find it too costly to use Pig. This should definitely help accelerate Pig adoption process.
> After a brief look at the current code, this new feature will require changes in Map Reduce execution engine so I will wait until the type branch is complete before start working on this (If nobody expresses interest in doing it :) ) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-257) Allow usage of custom Hadoop InputFormat in Pig

Posted by "Pi Song (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12603041#action_12603041 ] 

Pi Song commented on PIG-257:
-----------------------------

For OutputFormat, we may just have to allow Bag of tuple of arity two

> Allow usage of custom Hadoop InputFormat in Pig
> -----------------------------------------------
>
>                 Key: PIG-257
>                 URL: https://issues.apache.org/jira/browse/PIG-257
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Pi Song
>
> This very cool idea sprang out from a discussion in mailing-list (Thanks Manish Shah)
> There is a semantic issue that Hadoop Input Format generally expects K,V but Pig expects Tuple. We can solve this by sticking K,V as fields in Tuple. 
> Provided that we've got rich built-in string/binary manipulation functions, Hadoop users shouldn't find it too costly to use Pig. This should definitely help accelerate Pig adoption process.
> After a brief look at the current code, this new feature will require changes in Map Reduce execution engine so I will wait until the type branch is complete before start working on this (If nobody expresses interest in doing it :) ) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.