You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2009/05/29 19:58:45 UTC

[jira] Created: (PIG-823) Hadoop Metadata Service

Hadoop Metadata Service
-----------------------

                 Key: PIG-823
                 URL: https://issues.apache.org/jira/browse/PIG-823
             Project: Pig
          Issue Type: New Feature
            Reporter: Olga Natkovich


This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.

Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-823) Hadoop Metadata Service

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743053#action_12743053 ] 

Hadoop QA commented on PIG-823:
-------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12416510/owl_otherdeps.tgz
  against trunk revision 803377.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/163/console

This message is automatically generated.

> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>         Attachments: owl.filelist, owl.patch.gz, owl_libdeps.tgz, owl_otherdeps.tgz
>
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-823) Hadoop Metadata Service

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714539#action_12714539 ] 

Jeff Hammerbacher commented on PIG-823:
---------------------------------------

Hey,

Hadoop already had a metadata service (well defined at http://svn.apache.org/viewvc/hadoop/hive/trunk/metastore/if/hive_metastore.thrift) and a SQL implementation in production use at scale at several organizations. Can any of that work be reused for this purpose? It seems like duplicating effort across subprojects is a bad idea.

Later,
Jeff

> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-823) Hadoop Metadata Service

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714543#action_12714543 ] 

Olga Natkovich commented on PIG-823:
------------------------------------

We looked at metadata in Hive and it is really focused around higher level of abstraction such as tables/partitions etc. We would like to have something lower level, more generic, and closer to HDFS. We see a wider use for this system then just to support for SQL though SQL for Pig might be the first user.


> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-823) Hadoop Metadata Service

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718376#action_12718376 ] 

Matei Zaharia commented on PIG-823:
-----------------------------------

(To clarify, I meant data where the interface to access it is not HDFS; of course HBase is backed by files on HDFS).

> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-823) Hadoop Metadata Service

Posted by "Sushanth Sowmyan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sushanth Sowmyan updated PIG-823:
---------------------------------

    Attachment: owl_otherdeps.tgz

owl_otherdeps.tgz : Other binaries 

> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>         Attachments: owl.filelist, owl.patch.gz, owl_libdeps.tgz, owl_otherdeps.tgz
>
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (PIG-823) Hadoop Metadata Service

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714547#action_12714547 ] 

Jeff Hammerbacher edited comment on PIG-823 at 5/29/09 11:48 AM:
-----------------------------------------------------------------

It's an open source project and easily extensible. There are many extensions to the service within Facebook to support more general information. Why not try to add the desired "lower level" metadata to the existing service as a patch to Hive, since it's already got pluggable backends and a server implementation already defined? Also, could you better define what "close to HDFS" means? There's a lot of HDFS metadata stored in the NameNode. Also, the initial implementation of the metadata repository for Hive stored data in HDFS, but it was found to be quite useful to have a separate service for metadata. Perhaps you could learn from their experiences?

      was (Author: hammer):
    It's an open source project and easily extensible. There are many extensions to the service within Facebook to support more general information. Why not try to add them to the existing service, since it's already got pluggable backends and a server implementation already defined?
  
> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-823) Hadoop Metadata Service

Posted by "Sushanth Sowmyan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sushanth Sowmyan updated PIG-823:
---------------------------------

    Attachment: owl.patch.gz
                owl.filelist

owl.filelist : the output of an svn add contrib/owl after which the patch was generated. (includes listing of binaries yet to be attached as well)
owl.patch.gz : Owl-0.1 patch , to be patched from outside the contrib directory in pig.

(libraries and other binaries still yet to be attached)

> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>         Attachments: owl.filelist, owl.patch.gz
>
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-823) Hadoop Metadata Service

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-823:
---------------------------

        Status: Resolved  (was: Patch Available)
    Resolution: Duplicate

> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>         Attachments: owl.filelist, owl.patch.gz, owl_libdeps.tgz, owl_otherdeps.tgz
>
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-823) Hadoop Metadata Service

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716722#action_12716722 ] 

Jeff Hammerbacher commented on PIG-823:
---------------------------------------

Hey Alan,

Thanks for the additional detail. I suppose I can wait for the document to be released to the public, but it sounds as if you're creating a separate "extended attributes" service to host non-core file and directory metadata separately from the NN. It's not clear to me that this is a positive development for Hadoop. Perhaps we should spend the engineering effort on a single, partitioned, available metadata service for all file and directory attributes? The project has larger scope and requires but is potentially a cleaner solution for the long term.

Later,
Jeff

> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-823) Hadoop Metadata Service

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717841#action_12717841 ] 

Alan Gates commented on PIG-823:
--------------------------------

http://wiki.apache.org/pig/Metadata

> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-823) Hadoop Metadata Service

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717876#action_12717876 ] 

Matei Zaharia commented on PIG-823:
-----------------------------------

I agree with Jeff that that it might be better to make this service a feature of HDFS rather than a component of Pig. A metadata service might be useful to people who don't use Pig at all, e.g. who just load data and process it with MapReduce (which is a use case you cover on the Wiki page). Having a single, standard metadata service would allow unrelated tools for loading data, processing it, browsing it, etc to interoperate.

> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-823) Hadoop Metadata Service

Posted by "Amr Awadallah (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717933#action_12717933 ] 

Amr Awadallah commented on PIG-823:
-----------------------------------

+1 to unified meta-data service.

-- amr


> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-823) Hadoop Metadata Service

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716679#action_12716679 ] 

Alan Gates commented on PIG-823:
--------------------------------

By lower level of metadata, we don't mean storing information already present in the namenode.  The difference is in the model perspective.  Hive's metadata model consists of tables and partitions, which is appropriate since it works with SQL which presents a relational view to users.  Our proposal is to construct a metadata service that models directories and files.  Map Reduce and Pig Latin present a file based view to users, and thus this model is more appropriate for those tools.

I met a couple of times with the Facebook team to discuss metadata, and our desire to have a hierarchical model.  They agreed that this did not fit with the model they were using.  We both agreed that any metadata service built around the files should have an interface that their metadata service can easily connect to, so that if a user wishes to use both they can do so without needing to register metadata in both.

As for documentation, we're working on getting ready for external release.  We hope to post it in the next week or so.


> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-823) Hadoop Metadata Service

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718063#action_12718063 ] 

Alan Gates commented on PIG-823:
--------------------------------

In response to Matei's comment:

The intent is not that this is Pig metadata, but that it be grid wide metadata.  We don't want to put it directly in HDFS by extending the namenode, since the namenode is already heavily loaded and a central contention point in the system.  We also want it to remain optional, as many users will not need it.

The vision is that this will be a separate module that Hadoop users can choose to install and use with their system, along with other modules they use, such as Pig, Hive, Chuckwa, etc.

The Pig team is volunteering to put it in our contrib for now because Pig is interested in it and willing to devote the resources to help it get started.

> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-823) Hadoop Metadata Service

Posted by "Sushanth Sowmyan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sushanth Sowmyan updated PIG-823:
---------------------------------

    Status: Patch Available  (was: Open)

owl.filelist : the output of an svn add contrib/owl after which the patch was generated. (includes listing of binaries yet to be attached as well)
owl.patch.gz : Owl-0.1 patch , to be patched from outside the contrib directory in pig.
owl_libdeps.tgz : libraries that extract to contrib/owl/java/lib/
owl_otherdeps.tgz : Other binaries

--

All the .tgz files can be extracted from your pig root dir (they extract to contrib/owl from the working dir) after applying the patch.

This is our initial upload of the 0.1 implementation of owl.

> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>         Attachments: owl.filelist, owl.patch.gz, owl_libdeps.tgz, owl_otherdeps.tgz
>
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (PIG-823) Hadoop Metadata Service

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714539#action_12714539 ] 

Jeff Hammerbacher edited comment on PIG-823 at 5/29/09 11:20 AM:
-----------------------------------------------------------------

Hey,

Hadoop already has a metadata service (well defined at http://svn.apache.org/viewvc/hadoop/hive/trunk/metastore/if/hive_metastore.thrift) and a SQL implementation in production use at scale at several organizations. Can any of that work be reused for this purpose? It seems like duplicating effort across subprojects is a bad idea.

Later,
Jeff

      was (Author: hammer):
    Hey,

Hadoop already had a metadata service (well defined at http://svn.apache.org/viewvc/hadoop/hive/trunk/metastore/if/hive_metastore.thrift) and a SQL implementation in production use at scale at several organizations. Can any of that work be reused for this purpose? It seems like duplicating effort across subprojects is a bad idea.

Later,
Jeff
  
> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-823) Hadoop Metadata Service

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714547#action_12714547 ] 

Jeff Hammerbacher commented on PIG-823:
---------------------------------------

It's an open source project and easily extensible. There are many extensions to the service within Facebook to support more general information. Why not try to add them to the existing service, since it's already got pluggable backends and a server implementation already defined?

> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-823) Hadoop Metadata Service

Posted by "Amr Awadallah (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718313#action_12718313 ] 

Amr Awadallah commented on PIG-823:
-----------------------------------

sounds good, thanks for elaborating.

> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-823) Hadoop Metadata Service

Posted by "Sushanth Sowmyan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sushanth Sowmyan updated PIG-823:
---------------------------------

    Attachment: owl_libdeps.tgz

owl_libdeps.tgz : libraries that extract to contrib/owl/java/lib/

> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>         Attachments: owl.filelist, owl.patch.gz, owl_libdeps.tgz
>
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-823) Hadoop Metadata Service

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718374#action_12718374 ] 

Matei Zaharia commented on PIG-823:
-----------------------------------

That sounds great. If this is sufficiently extensible, it might potentially even be useful for data that is not in HDFS, such as HBase tables (though we should avoid making the system overly complex).

> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-823) Hadoop Metadata Service

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744326#action_12744326 ] 

Jeff Hammerbacher commented on PIG-823:
---------------------------------------

Hey,

Great to see Owl source! I've filed a ticket over on the Hive project (https://issues.apache.org/jira/browse/HIVE-762) to see if we can find some common ground between Pig and Hive's metadata needs; it would be great to have a single metadata service for all of Hadoop's structured data manipulation tools. If you're interested, please chime in there (or open a ticket here? Whatever seems sane to you).

Thanks,
Jeff

> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>         Attachments: owl.filelist, owl.patch.gz, owl_libdeps.tgz, owl_otherdeps.tgz
>
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-823) Hadoop Metadata Service

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716463#action_12716463 ] 

Jeff Hammerbacher commented on PIG-823:
---------------------------------------

Hey Olga,

Really looking forward to seeing more discussion on this issue. The NameNode already contains file metadata like ctime, mtime, the block list, permissions, etc. Will the proposed metadata service subsume those attributes as well? Curious to see the proposed design.

Thanks,
Jeff

> Hadoop Metadata Service
> -----------------------
>
>                 Key: PIG-823
>                 URL: https://issues.apache.org/jira/browse/PIG-823
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>
> This JIRA is created to track development of a metadata system for  Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed.
> Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.