You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Manish Shah <ma...@rapleaf.com> on 2008/06/06 01:49:22 UTC

loading using a custom input formatter

We have a custom input formatter that we use for regular map/reduce  
jobs.  Is there a way to make use of this input formatter in pig?   
We've looked at most of the docs, and havent found much.  The issue  
we have is that we arent loading data from a single file.  Also the  
number of files is not determinable so we cant just write separate  
load commands in our pig latin.

The input formatter we have takes care of giving back records that  
conform to key/value semantics for hadoop map/reduce functions.  Is  
there a reason it couldnt be used to generate tuples from the  
resultant records?

- Manish
Co-Founder Rapleaf.com
http://www.rapleaf.com/pub/Manish-Shah


Re: loading using a custom input formatter

Posted by pi song <pi...@gmail.com>.
You're right. The first impedance is the difference between input semantic
where Hadoop expects K,V  but Pig expects Tuple. However this doesn't stop
us from encapsulating K,V as fields in Tuple. I had a brief look at
PigInputFormat and I think there is a possibility that we can build a
special input format which will allow users to plug-in existing Hadoop input
format. This is a very nice feature to have!!.  BTW, I guess we just have to
wait a bit more as this seems to require changes in MapReduce execution
engine which is currently being completely rewritten.

I will create a placeholder Jira for this.  Thanks a lot for your idea.

Pi

On Fri, Jun 6, 2008 at 9:49 AM, Manish Shah <ma...@rapleaf.com> wrote:

> We have a custom input formatter that we use for regular map/reduce jobs.
>  Is there a way to make use of this input formatter in pig?  We've looked at
> most of the docs, and havent found much.  The issue we have is that we arent
> loading data from a single file.  Also the number of files is not
> determinable so we cant just write separate load commands in our pig latin.
>
> The input formatter we have takes care of giving back records that conform
> to key/value semantics for hadoop map/reduce functions.  Is there a reason
> it couldnt be used to generate tuples from the resultant records?
>
> - Manish
> Co-Founder Rapleaf.com
> http://www.rapleaf.com/pub/Manish-Shah
>
>