You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Johan Oskarsson (JIRA)" <ji...@apache.org> on 2008/12/05 17:58:44 UTC

[jira] Created: (HIVE-126) Don't fetch information on Partitions from HDFS instead of MetaStore

Don't fetch information on Partitions from HDFS instead of MetaStore
--------------------------------------------------------------------

                 Key: HIVE-126
                 URL: https://issues.apache.org/jira/browse/HIVE-126
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Metastore
    Affects Versions: 0.19.0
            Reporter: Johan Oskarsson
            Assignee: Johan Oskarsson
             Fix For: 0.19.0


When investigating HIVE-91 an issue came up where the information on what partitions a table contains is loaded by listing the directories in the table directory on HDFS. This is then used to overrule what is in the MetaStore if any difference is found. 

* Would it not be preferable if MetaStore is the one authority on what the table contains?
* It will also be a major hassle (or impossible?) to retrieve this information from HDFS with external tables that have non standard partition names (HIVE-91), such as: table/2008/01/08/portugal where "2008/01/08" is one partition value and "portugal" is another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-126) Don't fetch information on Partitions from HDFS instead of MetaStore

Posted by "Johan Oskarsson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658664#action_12658664 ] 

Johan Oskarsson commented on HIVE-126:
--------------------------------------

Sure I'll update the HIVE-142 after Christmas, there is already a half working example patch if you want to have a look. It's worth noting that the HIVE-142 will require this patch to work.


> Don't fetch information on Partitions from HDFS instead of MetaStore
> --------------------------------------------------------------------
>
>                 Key: HIVE-126
>                 URL: https://issues.apache.org/jira/browse/HIVE-126
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Johan Oskarsson
>            Assignee: Johan Oskarsson
>             Fix For: 0.2.0
>
>         Attachments: HIVE-126.patch, HIVE-126.patch
>
>
> When investigating HIVE-91 an issue came up where the information on what partitions a table contains is loaded by listing the directories in the table directory on HDFS. This is then used to overrule what is in the MetaStore if any difference is found. 
> * Would it not be preferable if MetaStore is the one authority on what the table contains?
> * It will also be a major hassle (or impossible?) to retrieve this information from HDFS with external tables that have non standard partition names (HIVE-91), such as: table/2008/01/08/portugal where "2008/01/08" is one partition value and "portugal" is another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-126) Don't fetch information on Partitions from HDFS instead of MetaStore

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655384#action_12655384 ] 

Joydeep Sen Sarma commented on HIVE-126:
----------------------------------------

let's hold onto this while we figure out where we are going with hive-91

> Don't fetch information on Partitions from HDFS instead of MetaStore
> --------------------------------------------------------------------
>
>                 Key: HIVE-126
>                 URL: https://issues.apache.org/jira/browse/HIVE-126
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.19.0
>            Reporter: Johan Oskarsson
>            Assignee: Johan Oskarsson
>             Fix For: 0.19.0
>
>         Attachments: HIVE-126.patch, HIVE-126.patch
>
>
> When investigating HIVE-91 an issue came up where the information on what partitions a table contains is loaded by listing the directories in the table directory on HDFS. This is then used to overrule what is in the MetaStore if any difference is found. 
> * Would it not be preferable if MetaStore is the one authority on what the table contains?
> * It will also be a major hassle (or impossible?) to retrieve this information from HDFS with external tables that have non standard partition names (HIVE-91), such as: table/2008/01/08/portugal where "2008/01/08" is one partition value and "portugal" is another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


RE: [jira] Commented: (HIVE-126) Don't fetch information on Partitions from HDFS instead of MetaStore

Posted by Joydeep Sen Sarma <js...@facebook.com>.
This makes sense. We should allow this as an option and this was discussed initially in hive-91 (infer partitioning information from hdfs).

-----Original Message-----
From: Fred Oko [mailto:foko@hi5.com] 
Sent: Wednesday, January 21, 2009 5:48 PM
To: hive-dev@hadoop.apache.org
Subject: Re: [jira] Commented: (HIVE-126) Don't fetch information on Partitions from HDFS instead of MetaStore

Coming into this a tad late but we leveraged the simplicity of the original
state of this code i.e. "// let's trust hdfs partitions for now". Is it
possible to make this configurable? Basically we came up with a design that
treated Hive solely as a post-processing overlay in that we feed data into
HDFS directly but were fine with using the Hive expected partitioning
scheme. We could switch to HIVE-91 but for our current purposes the
integration of Hive into the data load stage presents no value.


On 1/13/09 2:09 AM, "Johan Oskarsson (JIRA)" <ji...@apache.org> wrote:

> 
>     [ 
> https://issues.apache.org/jira/browse/HIVE-126?page=com.atlassian.jira.plugin.
> system.issuetabpanels:comment-tabpanel&focusedCommentId=12663286#action_126632
> 86 ] 
> 
> Johan Oskarsson commented on HIVE-126:
> --------------------------------------
> 
> Since HIVE-142 is committed and the code in this ticket has been committed by
> accident, perhaps it's best to just update the CHANGES file with the ticket
> information and close this one?
> 
>> Don't fetch information on Partitions from HDFS instead of MetaStore
>> --------------------------------------------------------------------
>> 
>>                 Key: HIVE-126
>>                 URL: https://issues.apache.org/jira/browse/HIVE-126
>>             Project: Hadoop Hive
>>          Issue Type: Improvement
>>          Components: Metastore
>>            Reporter: Johan Oskarsson
>>            Assignee: Johan Oskarsson
>>             Fix For: 0.2.0
>> 
>>         Attachments: HIVE-126.patch, HIVE-126.patch
>> 
>> 
>> When investigating HIVE-91 an issue came up where the information on what
>> partitions a table contains is loaded by listing the directories in the table
>> directory on HDFS. This is then used to overrule what is in the MetaStore if
>> any difference is found.
>> * Would it not be preferable if MetaStore is the one authority on what the
>> table contains?
>> * It will also be a major hassle (or impossible?) to retrieve this
>> information from HDFS with external tables that have non standard partition
>> names (HIVE-91), such as: table/2008/01/08/portugal where "2008/01/08" is one
>> partition value and "portugal" is another.


Re: [jira] Commented: (HIVE-126) Don't fetch information on Partitions from HDFS instead of MetaStore

Posted by Fred Oko <fo...@hi5.com>.
Coming into this a tad late but we leveraged the simplicity of the original
state of this code i.e. "// let's trust hdfs partitions for now". Is it
possible to make this configurable? Basically we came up with a design that
treated Hive solely as a post-processing overlay in that we feed data into
HDFS directly but were fine with using the Hive expected partitioning
scheme. We could switch to HIVE-91 but for our current purposes the
integration of Hive into the data load stage presents no value.


On 1/13/09 2:09 AM, "Johan Oskarsson (JIRA)" <ji...@apache.org> wrote:

> 
>     [ 
> https://issues.apache.org/jira/browse/HIVE-126?page=com.atlassian.jira.plugin.
> system.issuetabpanels:comment-tabpanel&focusedCommentId=12663286#action_126632
> 86 ] 
> 
> Johan Oskarsson commented on HIVE-126:
> --------------------------------------
> 
> Since HIVE-142 is committed and the code in this ticket has been committed by
> accident, perhaps it's best to just update the CHANGES file with the ticket
> information and close this one?
> 
>> Don't fetch information on Partitions from HDFS instead of MetaStore
>> --------------------------------------------------------------------
>> 
>>                 Key: HIVE-126
>>                 URL: https://issues.apache.org/jira/browse/HIVE-126
>>             Project: Hadoop Hive
>>          Issue Type: Improvement
>>          Components: Metastore
>>            Reporter: Johan Oskarsson
>>            Assignee: Johan Oskarsson
>>             Fix For: 0.2.0
>> 
>>         Attachments: HIVE-126.patch, HIVE-126.patch
>> 
>> 
>> When investigating HIVE-91 an issue came up where the information on what
>> partitions a table contains is loaded by listing the directories in the table
>> directory on HDFS. This is then used to overrule what is in the MetaStore if
>> any difference is found.
>> * Would it not be preferable if MetaStore is the one authority on what the
>> table contains?
>> * It will also be a major hassle (or impossible?) to retrieve this
>> information from HDFS with external tables that have non standard partition
>> names (HIVE-91), such as: table/2008/01/08/portugal where "2008/01/08" is one
>> partition value and "portugal" is another.


[jira] Commented: (HIVE-126) Don't fetch information on Partitions from HDFS instead of MetaStore

Posted by "Johan Oskarsson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663286#action_12663286 ] 

Johan Oskarsson commented on HIVE-126:
--------------------------------------

Since HIVE-142 is committed and the code in this ticket has been committed by accident, perhaps it's best to just update the CHANGES file with the ticket information and close this one?

> Don't fetch information on Partitions from HDFS instead of MetaStore
> --------------------------------------------------------------------
>
>                 Key: HIVE-126
>                 URL: https://issues.apache.org/jira/browse/HIVE-126
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Johan Oskarsson
>            Assignee: Johan Oskarsson
>             Fix For: 0.2.0
>
>         Attachments: HIVE-126.patch, HIVE-126.patch
>
>
> When investigating HIVE-91 an issue came up where the information on what partitions a table contains is loaded by listing the directories in the table directory on HDFS. This is then used to overrule what is in the MetaStore if any difference is found. 
> * Would it not be preferable if MetaStore is the one authority on what the table contains?
> * It will also be a major hassle (or impossible?) to retrieve this information from HDFS with external tables that have non standard partition names (HIVE-91), such as: table/2008/01/08/portugal where "2008/01/08" is one partition value and "portugal" is another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-126) Don't fetch information on Partitions from HDFS instead of MetaStore

Posted by "Johan Oskarsson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Johan Oskarsson updated HIVE-126:
---------------------------------

    Attachment: HIVE-126.patch

First attempt at a patch using my preferred solution, removing the code that reads partition information from the HDFS entirely and instead relying on the MetaStore for accurate information.
Although I assume the code was put in there for a reason so I'd love to hear more about it. Perhaps a fsck type command could be implemented to compare on disk data with the MetaStore?

The other solution I can think about to allow HIVE-91 to move forward is to only get partition information from HDFS if it's not an external table.

> Don't fetch information on Partitions from HDFS instead of MetaStore
> --------------------------------------------------------------------
>
>                 Key: HIVE-126
>                 URL: https://issues.apache.org/jira/browse/HIVE-126
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.19.0
>            Reporter: Johan Oskarsson
>            Assignee: Johan Oskarsson
>             Fix For: 0.19.0
>
>         Attachments: HIVE-126.patch
>
>
> When investigating HIVE-91 an issue came up where the information on what partitions a table contains is loaded by listing the directories in the table directory on HDFS. This is then used to overrule what is in the MetaStore if any difference is found. 
> * Would it not be preferable if MetaStore is the one authority on what the table contains?
> * It will also be a major hassle (or impossible?) to retrieve this information from HDFS with external tables that have non standard partition names (HIVE-91), such as: table/2008/01/08/portugal where "2008/01/08" is one partition value and "portugal" is another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-126) Don't fetch information on Partitions from HDFS instead of MetaStore

Posted by "Johan Oskarsson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Johan Oskarsson updated HIVE-126:
---------------------------------

    Attachment: HIVE-126.patch

Updated patch with unit test for the added code.

> Don't fetch information on Partitions from HDFS instead of MetaStore
> --------------------------------------------------------------------
>
>                 Key: HIVE-126
>                 URL: https://issues.apache.org/jira/browse/HIVE-126
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.19.0
>            Reporter: Johan Oskarsson
>            Assignee: Johan Oskarsson
>             Fix For: 0.19.0
>
>         Attachments: HIVE-126.patch, HIVE-126.patch
>
>
> When investigating HIVE-91 an issue came up where the information on what partitions a table contains is loaded by listing the directories in the table directory on HDFS. This is then used to overrule what is in the MetaStore if any difference is found. 
> * Would it not be preferable if MetaStore is the one authority on what the table contains?
> * It will also be a major hassle (or impossible?) to retrieve this information from HDFS with external tables that have non standard partition names (HIVE-91), such as: table/2008/01/08/portugal where "2008/01/08" is one partition value and "portugal" is another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-126) Don't fetch information on Partitions from HDFS instead of MetaStore

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-126:
----------------------------

      Resolution: Fixed
    Release Note: HIVE-126. Don't fetch information on Partitions from HDFS instead of MetaStore. (Johan Oskarsson via zshao).
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Code was committed in 728771. Thanks Johan!

> Don't fetch information on Partitions from HDFS instead of MetaStore
> --------------------------------------------------------------------
>
>                 Key: HIVE-126
>                 URL: https://issues.apache.org/jira/browse/HIVE-126
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Johan Oskarsson
>            Assignee: Johan Oskarsson
>             Fix For: 0.2.0
>
>         Attachments: HIVE-126.patch, HIVE-126.patch
>
>
> When investigating HIVE-91 an issue came up where the information on what partitions a table contains is loaded by listing the directories in the table directory on HDFS. This is then used to overrule what is in the MetaStore if any difference is found. 
> * Would it not be preferable if MetaStore is the one authority on what the table contains?
> * It will also be a major hassle (or impossible?) to retrieve this information from HDFS with external tables that have non standard partition names (HIVE-91), such as: table/2008/01/08/portugal where "2008/01/08" is one partition value and "portugal" is another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-126) Don't fetch information on Partitions from HDFS instead of MetaStore

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654533#action_12654533 ] 

Joydeep Sen Sarma commented on HIVE-126:
----------------------------------------

yes - the code was put in there as a safeguard. the history here is that we migrated our current hive warehouse from an older version of the software and were worried about not capturing all the older partitions in the new metastore. we kind of knew that the code was a hack - but was a pure defensive measure.

couple of comments:
- we should move all metadata logic (including hacks if any :-)) - to the metastore server side. otherwise we are creating a different view for Java vs. Thrift Clients.
-  yes - +1 on a fsck type command to replace this hack. i would actually like to run such a command on our current tables before removing this hack.

the core issue is whether we can make this change without having  a fsck like utility in some form (even a custom java program). That would also preserve some of the current code for handling this case.

-----

for a command line interface - one might want to check the entire database or just a table or even just one partition. other metadata checks will also be added over time (for example - do the file types on disk agree with metadata records, bucketing information etc). So, here's a strawman proposal for a new command:

alter table <DB>[.TABLE [PARTITION-SPEC]] check [TYPE-LIST]

where TYPE by default is 'all' (check for all kinds of errors), but can be specified to a specific type. For example - in this case - we can have a type called 'partitons' (and then over time we can add other types like 'fileformat' etc.). for v1 - we can just drop the type-list altogether.

the check command can produce a list of things that need to be done to fix the format (like adding any directories not in the metastore - but in hdfs - to the metastore). actually performing of such steps would require a user confirmation (y/n).

---
Java interfaces. We have been pretty cavalier with Java interfaces. right now most of the Hive public methods (other than the SerDe stuff) is not accessed by any codebase outside Hive. So i would say just remove them for now - as we go through the code module by module - we can identify those modules that we actually want to expose publicly. 





> Don't fetch information on Partitions from HDFS instead of MetaStore
> --------------------------------------------------------------------
>
>                 Key: HIVE-126
>                 URL: https://issues.apache.org/jira/browse/HIVE-126
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.19.0
>            Reporter: Johan Oskarsson
>            Assignee: Johan Oskarsson
>             Fix For: 0.19.0
>
>         Attachments: HIVE-126.patch
>
>
> When investigating HIVE-91 an issue came up where the information on what partitions a table contains is loaded by listing the directories in the table directory on HDFS. This is then used to overrule what is in the MetaStore if any difference is found. 
> * Would it not be preferable if MetaStore is the one authority on what the table contains?
> * It will also be a major hassle (or impossible?) to retrieve this information from HDFS with external tables that have non standard partition names (HIVE-91), such as: table/2008/01/08/portugal where "2008/01/08" is one partition value and "portugal" is another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-126) Don't fetch information on Partitions from HDFS instead of MetaStore

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658643#action_12658643 ] 

Prasad Chakka commented on HIVE-126:
------------------------------------

Johan,
I think we should have HIVE-142 before removing this code. This would help us verify that nothing is amiss.



> Don't fetch information on Partitions from HDFS instead of MetaStore
> --------------------------------------------------------------------
>
>                 Key: HIVE-126
>                 URL: https://issues.apache.org/jira/browse/HIVE-126
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Johan Oskarsson
>            Assignee: Johan Oskarsson
>             Fix For: 0.2.0
>
>         Attachments: HIVE-126.patch, HIVE-126.patch
>
>
> When investigating HIVE-91 an issue came up where the information on what partitions a table contains is loaded by listing the directories in the table directory on HDFS. This is then used to overrule what is in the MetaStore if any difference is found. 
> * Would it not be preferable if MetaStore is the one authority on what the table contains?
> * It will also be a major hassle (or impossible?) to retrieve this information from HDFS with external tables that have non standard partition names (HIVE-91), such as: table/2008/01/08/portugal where "2008/01/08" is one partition value and "portugal" is another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-126) Don't fetch information on Partitions from HDFS instead of MetaStore

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-126:
--------------------------------

    Fix Version/s: 0.3.0
                       (was: 0.6.0)

> Don't fetch information on Partitions from HDFS instead of MetaStore
> --------------------------------------------------------------------
>
>                 Key: HIVE-126
>                 URL: https://issues.apache.org/jira/browse/HIVE-126
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Johan Oskarsson
>            Assignee: Johan Oskarsson
>             Fix For: 0.3.0
>
>         Attachments: HIVE-126.patch, HIVE-126.patch
>
>
> When investigating HIVE-91 an issue came up where the information on what partitions a table contains is loaded by listing the directories in the table directory on HDFS. This is then used to overrule what is in the MetaStore if any difference is found. 
> * Would it not be preferable if MetaStore is the one authority on what the table contains?
> * It will also be a major hassle (or impossible?) to retrieve this information from HDFS with external tables that have non standard partition names (HIVE-91), such as: table/2008/01/08/portugal where "2008/01/08" is one partition value and "portugal" is another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-126) Don't fetch information on Partitions from HDFS instead of MetaStore

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658078#action_12658078 ] 

Prasad Chakka commented on HIVE-126:
------------------------------------

that code was needed when we have both the old and new code bases were operating against the same data at the same time. but no longer needed can be safely removed. 

+1

> Don't fetch information on Partitions from HDFS instead of MetaStore
> --------------------------------------------------------------------
>
>                 Key: HIVE-126
>                 URL: https://issues.apache.org/jira/browse/HIVE-126
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Johan Oskarsson
>            Assignee: Johan Oskarsson
>             Fix For: 0.2.0
>
>         Attachments: HIVE-126.patch, HIVE-126.patch
>
>
> When investigating HIVE-91 an issue came up where the information on what partitions a table contains is loaded by listing the directories in the table directory on HDFS. This is then used to overrule what is in the MetaStore if any difference is found. 
> * Would it not be preferable if MetaStore is the one authority on what the table contains?
> * It will also be a major hassle (or impossible?) to retrieve this information from HDFS with external tables that have non standard partition names (HIVE-91), such as: table/2008/01/08/portugal where "2008/01/08" is one partition value and "portugal" is another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-126) Don't fetch information on Partitions from HDFS instead of MetaStore

Posted by "Johan Oskarsson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654776#action_12654776 ] 

Johan Oskarsson commented on HIVE-126:
--------------------------------------

I created a new issue for the metastore check command: HIVE-142, thanks for the thorough reply.

Personally I'd like to have this patch committed first so I can finish my work on HIVE-91, I've got a patch more or less ready, but I can see why you'd want HIVE-142 done first.

> Don't fetch information on Partitions from HDFS instead of MetaStore
> --------------------------------------------------------------------
>
>                 Key: HIVE-126
>                 URL: https://issues.apache.org/jira/browse/HIVE-126
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.19.0
>            Reporter: Johan Oskarsson
>            Assignee: Johan Oskarsson
>             Fix For: 0.19.0
>
>         Attachments: HIVE-126.patch, HIVE-126.patch
>
>
> When investigating HIVE-91 an issue came up where the information on what partitions a table contains is loaded by listing the directories in the table directory on HDFS. This is then used to overrule what is in the MetaStore if any difference is found. 
> * Would it not be preferable if MetaStore is the one authority on what the table contains?
> * It will also be a major hassle (or impossible?) to retrieve this information from HDFS with external tables that have non standard partition names (HIVE-91), such as: table/2008/01/08/portugal where "2008/01/08" is one partition value and "portugal" is another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-126) Don't fetch information on Partitions from HDFS instead of MetaStore

Posted by "Johan Oskarsson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654185#action_12654185 ] 

Johan Oskarsson commented on HIVE-126:
--------------------------------------

Forgot to ask, what's the policy for removing public methods in Hive? Release one version with them deprecated and then remove in the following release like Hadoop?
If so we could deprecate the methods and only get partition information from HDFS if it isn't an external table in the next release. Then in the following release we would remove those methods permanently. Thoughts?

> Don't fetch information on Partitions from HDFS instead of MetaStore
> --------------------------------------------------------------------
>
>                 Key: HIVE-126
>                 URL: https://issues.apache.org/jira/browse/HIVE-126
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.19.0
>            Reporter: Johan Oskarsson
>            Assignee: Johan Oskarsson
>             Fix For: 0.19.0
>
>         Attachments: HIVE-126.patch
>
>
> When investigating HIVE-91 an issue came up where the information on what partitions a table contains is loaded by listing the directories in the table directory on HDFS. This is then used to overrule what is in the MetaStore if any difference is found. 
> * Would it not be preferable if MetaStore is the one authority on what the table contains?
> * It will also be a major hassle (or impossible?) to retrieve this information from HDFS with external tables that have non standard partition names (HIVE-91), such as: table/2008/01/08/portugal where "2008/01/08" is one partition value and "portugal" is another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-126) Don't fetch information on Partitions from HDFS instead of MetaStore

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655329#action_12655329 ] 

Joydeep Sen Sarma commented on HIVE-126:
----------------------------------------

+1. looks good.

let me take a look at hive-142 as well - it would be better if we can get them both in at the same time ..

> Don't fetch information on Partitions from HDFS instead of MetaStore
> --------------------------------------------------------------------
>
>                 Key: HIVE-126
>                 URL: https://issues.apache.org/jira/browse/HIVE-126
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.19.0
>            Reporter: Johan Oskarsson
>            Assignee: Johan Oskarsson
>             Fix For: 0.19.0
>
>         Attachments: HIVE-126.patch, HIVE-126.patch
>
>
> When investigating HIVE-91 an issue came up where the information on what partitions a table contains is loaded by listing the directories in the table directory on HDFS. This is then used to overrule what is in the MetaStore if any difference is found. 
> * Would it not be preferable if MetaStore is the one authority on what the table contains?
> * It will also be a major hassle (or impossible?) to retrieve this information from HDFS with external tables that have non standard partition names (HIVE-91), such as: table/2008/01/08/portugal where "2008/01/08" is one partition value and "portugal" is another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-126) Don't fetch information on Partitions from HDFS instead of MetaStore

Posted by "Johan Oskarsson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Johan Oskarsson updated HIVE-126:
---------------------------------

    Status: Patch Available  (was: Open)

> Don't fetch information on Partitions from HDFS instead of MetaStore
> --------------------------------------------------------------------
>
>                 Key: HIVE-126
>                 URL: https://issues.apache.org/jira/browse/HIVE-126
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.19.0
>            Reporter: Johan Oskarsson
>            Assignee: Johan Oskarsson
>             Fix For: 0.19.0
>
>         Attachments: HIVE-126.patch, HIVE-126.patch
>
>
> When investigating HIVE-91 an issue came up where the information on what partitions a table contains is loaded by listing the directories in the table directory on HDFS. This is then used to overrule what is in the MetaStore if any difference is found. 
> * Would it not be preferable if MetaStore is the one authority on what the table contains?
> * It will also be a major hassle (or impossible?) to retrieve this information from HDFS with external tables that have non standard partition names (HIVE-91), such as: table/2008/01/08/portugal where "2008/01/08" is one partition value and "portugal" is another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.