You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Manish Shah <ma...@rapleaf.com> on 2008/05/29 21:53:08 UTC

loading thrift encoded data

I have a processing task that will span multiple map/reduce stages.   
I think pig will help in simplifying the creation of the task.  I was  
wondering if anyone has written anything to make it possible to load  
tuples from data stored in a collection of files with the caveat that  
each record in the files is a set of bytes that have been encoded  
using Thrift's TBinaryProtocol.

I guess the protocol used to encode the data is not a concern, but  
rather the fact that the load function needs to first decode the  
record with the same protocol and then build a tuple from each  
record.  I'm still learning about pig so i'd greatly appreciate if  
someone could point me in the right direction regarding this.  If no  
one has worked on something similar, i'll try to contribute something  
back if more people have a need for this.

thanks!

- Manish
Co-Founder Rapleaf.com
http://www.rapleaf.com/pub/Manish-Shah


Re: loading thrift encoded data

Posted by Cosmin Lehene <cl...@adobe.com>.
Hi Manish,

You can create your own load function that implements the required interface and actually decodes the input data before passing it back.
Check here for more details and also look at the PigStorage class for a good example: http://wiki.apache.org/pig/PigFunctions

Good luck,
Cosmin


On 5/29/08 10:53 PM, "Manish Shah" <ma...@rapleaf.com> wrote:

I have a processing task that will span multiple map/reduce stages.
I think pig will help in simplifying the creation of the task.  I was
wondering if anyone has written anything to make it possible to load
tuples from data stored in a collection of files with the caveat that
each record in the files is a set of bytes that have been encoded
using Thrift's TBinaryProtocol.

I guess the protocol used to encode the data is not a concern, but
rather the fact that the load function needs to first decode the
record with the same protocol and then build a tuple from each
record.  I'm still learning about pig so i'd greatly appreciate if
someone could point me in the right direction regarding this.  If no
one has worked on something similar, i'll try to contribute something
back if more people have a need for this.

thanks!

- Manish
Co-Founder Rapleaf.com
http://www.rapleaf.com/pub/Manish-Shah