You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by Apache Wiki <wi...@apache.org> on 2007/12/11 01:11:32 UTC

[Pig Wiki] Update of "PigOverview" by ChrisOlston

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by ChrisOlston:
http://wiki.apache.org/pig/PigOverview

New page:
---+++ What is Pig:

 * Pig has two parts:
   * A language for processing data, called <i>Pig Latin</i>.
   * A set of <i>evaluation mechanisms</i> for evaluating a Pig Latin program. Current evaluation mechanisms include (a) local evaluation in a single JVM, (2) evaluation by translation into one or more Map-Reduce jobs, executed using Hadoop.

---+++ Pig Latin programs:

 * Pig Latin has built-in relational-style operations such as filter, project, group, join. Pig Latin also has a map operation that applies a custom user function to every member of a set. In Pig Latin, the map operation is called "foreach".

 * Additionally, users can incorporate their own custom code into essentially any Pig Latin operation. For example, if a user has a function that determines whether a given image contains a human face, the user can ask Pig to filter images according to this function. Pig will then evaluate this function on the user's behalf, over the images. If the evaluation mechanism incorporates parallelism, as is the case with the Hadoop evaluation mechanism, then the user's function will be executed in a parallel fashion.

---+++ Data:

 * Pig can process data of any format. Some standard formats, e.g. tab delimited text files, are supported via built-in capabilities. A user can add support for a file format by writing a function that parses the bytes of a file into objects in Pig's data model, and vice versa.
 * Pig's data model is similar to the relational data model, except that tuples can be nested. For example, you can have a table of tuples, where the third field of each tuple contains a table. In Pig, tables are called bags.

---+++