You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by Apache Wiki <wi...@apache.org> on 2010/03/22 23:14:58 UTC

[Pig Wiki] Update of "owl" by jaytang

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The "owl" page has been changed by jaytang.
http://wiki.apache.org/pig/owl?action=diff&rev1=3&rev2=4

--------------------------------------------------

  
  = Apache Owl Wiki =
  
- The goal of Owl (a.k.a Hadoop metadata system) is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
+ The goal of Owl is to provide a high level data management abstraction than that provided by HDFS directories and files.  Applications written in MapReduce and Pig scripts must deal with low data data management issues such as storage format, serialization/compression schemes, data layout, and efficient data access paths, often with different solutions. Owl attempts to provide a standard way to addresses this issue.
  
- Please refer to this document for more detailed [[http://wiki.apache.org/pig/Metadata|use case, architecture, data model]]
+ Owl supports the notion of "Owl Tables", a basic unit of data management.  An Owl Table has these characteristics:
+ 
+    * lives in an Owl database name space and could contain multiple partitions
+    * has columns and rows and supports a unified table level schema
+    * supports MapReduce and Pig Latin and potentially other languages
+    * designed for batch read/write operations
+    * supports external tables (data already exists on file system)
+    * pluggable architecture for different storage format such as Zebra
+    * presents a logically partitioned view of data organization
+    * efficient data access mechanisms via partition and projection pruning
+ 
+ 
+ Owl supports two major public APIs.  "Owl Driver" provides management APIs against "Owl Table", "Owl Database", and "Partition".  This API is backed up by an internal Owl metadata store that runs on Tomcat and a relational database.  "OwlInputFormat" provides a data access API and is modeled after the traditional Hadoop InputFormat.  In the future, we plan to support "OwlOutputFormat" and thus the notion of "Owl Managed Table" where Owl controls the data flow into and out of "Owl Tables".  Owl supports Pig integration with OwlPigLoader/Storer module.
  
  
  == Prerequisite ==
  
- Owl high no dependency on the release of Hadoop and Pig
+ Owl depends on Pig for its tuple classes as a basic unit of data container, and Hadoop 20 for "OwlInputFormat".  Owl supports Zebra integration out of the box.
  
  == Getting Owl ==
  
@@ -26, +38 @@

     * JDK 1.6
     * Ant 1.7.1
     * download [[http://dev.mysql.com/downloads/connector/j/5.1.html|MySQL 5.1 JDBC driver]]
+    * Oracle
  
  How to compile
  
     * check out latest PIG trunk
+    * compile Pig
     * cd contrib/owl
     * copy MySQL JDBC driver to contrib/owl/java/lib directory
     * ant war (build owl web application)
@@ -40, +54 @@

  For development environment, Owl supports jetty 7.0 (with jetty-runner) and derby 10.5.  For production deployment, Owl supports:
  
     * Tomcat 6.0
-    * MySQL 5.1
+    * MySQL 5.1 or Oracle 11g
  
  After installing Tomcat and MySQL, you will need these files:
  
-    * owl.war - owl web application
+    * owl-<0.x.x>.war - owl web application
+    * owl-<0.x.x>.jar - owl client library OwlInputFormat and OwlDriver with all their dependent 3rd party libs
     * mysql_schema.sql - owl database schema file at contrib/owl/setup/mysql
     * owlServerConfig.xml - owl server configuration file at contrib/owl/setup/mysql
  
@@ -58, +73 @@

  
  == Sample Code == 
  
- Owl comes with a Java-based client.  Client API Javadoc is at: 
+ Owl comes with a Java-based client.  Client API Javadoc is at...  These two key packages contain the public APIs for Owl's main features: "org.apache.hadoop.owl.client" and "org.apache.hadoop.owl.mapreduce"
  
  Sample code is attached to write a client application against owl: