You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@pig.apache.org by Apache Wiki <wi...@apache.org> on 2008/03/10 22:03:11 UTC

[Pig Wiki] Update of "FrontPage" by ChrisOlston

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by ChrisOlston:
http://wiki.apache.org/pig/FrontPage

------------------------------------------------------------------------------

= Pig Wiki =

- [http://incubator.apache.org/pig/ Pig] is a platform for analyzing large data sets. Pig's language, Pig Latin, is a simple query algebra that lets you express data transformations such as merging data sets, filtering them, and applying functions to records or groups of records. Users can create their own functions to do special-purpose processing.
+ [http://incubator.apache.org/pig/ Pig] is a dataflow programming environment for processing very large files. Pig's language is called Pig Latin. A Pig Latin program consists of a directed acyclic graph where each node represents an operation that transforms data. Operations are of two flavors: (1) relational-algebra style operations such as join, filter, project; (2) functional-programming style operators such as map, reduce.

+ Pig compiles these dataflow programs into (sequences of) map-reduce jobs and executes them using Hadoop. It is also possible to execute Pig Latin programs in a "local" mode (without Hadoop cluster), in which case all processing takes place in a single local JVM.
- Pig Latin queries execute in a distributed fashion on a cluster. Our current implementation compiles Pig Latin programs into Map-Reduce jobs, and executes them using Hadoop cluster. It's also possible to execute queries
- in a "local" mode (without Hadoop cluster), with performance limitations caused by running all processing in a single local JVM.

== General Information ==