You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Mark <st...@gmail.com> on 2011/04/07 03:30:00 UTC

Loading arbitrary objects

If I wanted to load arbitrary objects into some tuples what classes 
should I be looking at? Would I need some of storage class?

For example I have data file with out that contains 
org.apache.mahout.fpm.pfpgrowth.convertors.string.TopKStringPatterns. I 
would like to iterate over them using pig using something like:

rows = LOAD 'data' using TopKStringPatternsStorage();

Is this correct? Is there any wiki on creating storages? Is there 
anything I should look out for?

Thanks for the pointers

Re: Loading arbitrary objects

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
If the arbitrary objects you refer to fit nicely into pig's notion of 
tuples/bags/maps/primitives, then you can directly use that.

Otherwise, due to limited support for complex/arbitrary objects in pig 
schema (no support for something like Writable for example), you will 
most probably need to treat the object's as bytearray (assuming they are 
serializable) and covert to/from byte[] as part of their use. Pig 
currently does not allow you to decouple an object from its serialization.


Regards,
Mridul

On Thursday 07 April 2011 07:00 AM, Mark wrote:
> If I wanted to load arbitrary objects into some tuples what classes
> should I be looking at? Would I need some of storage class?
>
> For example I have data file with out that contains
> org.apache.mahout.fpm.pfpgrowth.convertors.string.TopKStringPatterns. I
> would like to iterate over them using pig using something like:
>
> rows = LOAD 'data' using TopKStringPatternsStorage();
>
> Is this correct? Is there any wiki on creating storages? Is there
> anything I should look out for?
>
> Thanks for the pointers


Re: Loading arbitrary objects

Posted by Daniel Dai <ji...@yahoo-inc.com>.
You need a LoadFunc. Check 
http://pig.apache.org/docs/r0.8.0/udf.html#Load+Functions about how to 
write a LoadFunc.

Daniel

On 04/06/2011 06:30 PM, Mark wrote:
> If I wanted to load arbitrary objects into some tuples what classes
> should I be looking at? Would I need some of storage class?
>
> For example I have data file with out that contains
> org.apache.mahout.fpm.pfpgrowth.convertors.string.TopKStringPatterns. I
> would like to iterate over them using pig using something like:
>
> rows = LOAD 'data' using TopKStringPatternsStorage();
>
> Is this correct? Is there any wiki on creating storages? Is there
> anything I should look out for?
>
> Thanks for the pointers