You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by Apache Wiki <wi...@apache.org> on 2010/03/03 22:58:42 UTC

[Pig Wiki] Update of "Pig070LoadStoreHowTo" by PradeepKamath

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The "Pig070LoadStoreHowTo" page has been changed by PradeepKamath.
http://wiki.apache.org/pig/Pig070LoadStoreHowTo

--------------------------------------------------

New page:
=Overview=
This page describes how to go about writing Load functions and Store functions using the API available in Pig 0.7.0. 

== How to implement a Loader ==
[[ !LoadFunc || http://svn.apache.org/viewvc/hadoop/pig/trunk/src/org/apache/pig/LoadFunc.java?view=markup ]]  abstract class which has the main methods for loading data and for most use case it might suffice to extend it. There are 3 other optional interfaces which can be implemented to achieve extended functionality:
 * !LoadMetadata has methods to deal with metadata - most implementation of loaders don't need to implement this unless they interact with some metadata system. The getSchema() method in this interface provides a way for 
loader implementations to communicate the schema of the data back to pig. If a loader implementation returns data comprised of fields of real types (rather than !DataByteArray fields), it should provide the schema describing
the data returned through the getSchema() method. The other methods are concerned with other types of metadata like partition keys and statistics. Implementations can return null return values for these methods if they are
not applicable for that implementation.
 * !LoadPushDown has methods to push operations from pig runtime into loader implementations - currently only projections .i.e the pushProjection() method is called by Pig to communicate to the loader what exact fields 
are required in the pig script. The loader implementation can choose to honor the request or respond that it will not honor the request and return all fields in the data.If a loader implementation is able to efficiently
return only required fields, it should implement LoadPushDown to improve query performance.
 * !LoadCaster has methods to convert byte arrays to specific types. A loader implementation should implement this if casts (implicit or explicit) from !DataByteArray fields to other types need to be supported. 

The !LoadFunc abstract class 




== How to implement a Storer ==