You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Josh Ferguson (JIRA)" <ji...@apache.org> on 2009/01/14 19:19:59 UTC

[jira] Created: (HIVE-232) metastore.warehouse configuration should use inherited hadoop configuration

metastore.warehouse configuration should use inherited hadoop configuration
---------------------------------------------------------------------------

                 Key: HIVE-232
                 URL: https://issues.apache.org/jira/browse/HIVE-232
             Project: Hadoop Hive
          Issue Type: Bug
            Reporter: Josh Ferguson


the hive.metastore.warehouse.dir configuration property in hive-*.xml needs to use the protocol, host, and port when it is inherited from the fs.name property in hadoop-site.xml.

When it doesn't and no protocol is found then a broad range of "Move" operations when the source and target are both in the DFS will fail.

Currently this can be worked around by prepending the protocol, host and port of the hadoop nameserver into the value of the hive.metastore.warehouse.dir property.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-232) metastore.warehouse configuration should use inherited hadoop configuration

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663990#action_12663990 ] 

Joydeep Sen Sarma commented on HIVE-232:
----------------------------------------

ok with the change in the metastore. DB.java also has a reference to METASTOREWAREHOUSEDIR - but not clear that it's being used at all.

the change in LoadSemanticAnalyzer is probably not necessary (a test case might have made this clear). if i am reading this right - the net effect being desired is that the the from path should use fs.default.name for scheme/authority - right?

but this was already happening, a little twined but reading the (current) code:

    fs = FileSystem.get(fromURI, conf);
    String fromAuthority = null;

    // fall back to configuration based scheme if necessary
    if(StringUtils.isEmpty(fromScheme)) {
      fromScheme = fs.getUri().getScheme();
      fromAuthority = fs.getUri().getAuthority();
    }

    if(fromScheme.equals("hdfs")) {
      fromAuthority = StringUtils.isEmpty(fromURI.getAuthority()) ?
        fs.getUri().getAuthority() : fromURI.getAuthority();
    }

the fs initialization will get the scheme and authority from fs.default.name if the fromURI does not supply them. Then we test one by one if scheme and authority are already supplied - if not - we use the one supplied by fs (so use fs.default.name for both of these as a default).

unless i am missing something - we are just trying to do the same thing in a slightly different way?




> metastore.warehouse configuration should use inherited hadoop configuration
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-232
>                 URL: https://issues.apache.org/jira/browse/HIVE-232
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Josh Ferguson
>         Attachments: hive-232.patch
>
>
> the hive.metastore.warehouse.dir configuration property in hive-*.xml needs to use the protocol, host, and port when it is inherited from the fs.name property in hadoop-site.xml.
> When it doesn't and no protocol is found then a broad range of "Move" operations when the source and target are both in the DFS will fail.
> Currently this can be worked around by prepending the protocol, host and port of the hadoop nameserver into the value of the hive.metastore.warehouse.dir property.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-232) metastore.warehouse configuration should use inherited hadoop configuration

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663999#action_12663999 ] 

Prasad Chakka commented on HIVE-232:
------------------------------------

The reference in DB.java will go away after removing old metastore code.

when input path is just simply '/user/hive/warehouse/dir1' then the FileSystem.get() call doesn't know which the scheme/auth to use. It doesn't use the config  param fs.default.name value. That is the reason why Josh/Jeremy were getting errors when 'hdfs://localhost:9000' wasn't in the load inpath file name.

May be I could just set fromAuthority in local mode to be null and if the scheme is hdfs then it to be what ever is given in the input URI or the fs.default.name's authority instead of determining it from fs.getURI().getAuthority().



> metastore.warehouse configuration should use inherited hadoop configuration
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-232
>                 URL: https://issues.apache.org/jira/browse/HIVE-232
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Josh Ferguson
>         Attachments: hive-232.patch
>
>
> the hive.metastore.warehouse.dir configuration property in hive-*.xml needs to use the protocol, host, and port when it is inherited from the fs.name property in hadoop-site.xml.
> When it doesn't and no protocol is found then a broad range of "Move" operations when the source and target are both in the DFS will fail.
> Currently this can be worked around by prepending the protocol, host and port of the hadoop nameserver into the value of the hive.metastore.warehouse.dir property.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-232) metastore.warehouse configuration should use inherited hadoop configuration

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HIVE-232:
----------------------------------

    Hadoop Flags: [Reviewed]
          Status: Patch Available  (was: Open)

> metastore.warehouse configuration should use inherited hadoop configuration
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-232
>                 URL: https://issues.apache.org/jira/browse/HIVE-232
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Josh Ferguson
>            Assignee: Prasad Chakka
>         Attachments: hive-232.2.patch, hive-232.2.patch, hive-232.patch
>
>
> the hive.metastore.warehouse.dir configuration property in hive-*.xml needs to use the protocol, host, and port when it is inherited from the fs.name property in hadoop-site.xml.
> When it doesn't and no protocol is found then a broad range of "Move" operations when the source and target are both in the DFS will fail.
> Currently this can be worked around by prepending the protocol, host and port of the hadoop nameserver into the value of the hive.metastore.warehouse.dir property.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HIVE-232) metastore.warehouse configuration should use inherited hadoop configuration

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664000#action_12664000 ] 

jsensarma edited comment on HIVE-232 at 1/14/09 9:23 PM:
-----------------------------------------------------------------

hmm - i don't understand this. maybe we need to ping dhruba. fs object has to have scheme auth - otherwise how does it know what file system to talk to (and the fs object has already been instantiated - it's either a DistributedFileSystem object or a LocalFileSystem based object). 

Is this reproducible (with a populated fs.default.name)?

      was (Author: jsensarma):
    hmm - i don't understand this. maybe we need to ping dhruba. fs object has to have scheme auth - otherwise how does it know what file system to talk to (and the fs object has already been instantiated - it's not either a DistributedFileSystem object or a LocalFileSystem based object). 

Is this reproducible (with a populated fs.default.name)?
  
> metastore.warehouse configuration should use inherited hadoop configuration
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-232
>                 URL: https://issues.apache.org/jira/browse/HIVE-232
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Josh Ferguson
>         Attachments: hive-232.patch
>
>
> the hive.metastore.warehouse.dir configuration property in hive-*.xml needs to use the protocol, host, and port when it is inherited from the fs.name property in hadoop-site.xml.
> When it doesn't and no protocol is found then a broad range of "Move" operations when the source and target are both in the DFS will fail.
> Currently this can be worked around by prepending the protocol, host and port of the hadoop nameserver into the value of the hive.metastore.warehouse.dir property.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-232) metastore.warehouse configuration should use inherited hadoop configuration

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664001#action_12664001 ] 

Prasad Chakka commented on HIVE-232:
------------------------------------

yeah, it is reproducible on our production cluster. i think it becomes null which is local file system i think. we can confirm it with dhruba/code tomorrow.

> metastore.warehouse configuration should use inherited hadoop configuration
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-232
>                 URL: https://issues.apache.org/jira/browse/HIVE-232
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Josh Ferguson
>         Attachments: hive-232.patch
>
>
> the hive.metastore.warehouse.dir configuration property in hive-*.xml needs to use the protocol, host, and port when it is inherited from the fs.name property in hadoop-site.xml.
> When it doesn't and no protocol is found then a broad range of "Move" operations when the source and target are both in the DFS will fail.
> Currently this can be worked around by prepending the protocol, host and port of the hadoop nameserver into the value of the hive.metastore.warehouse.dir property.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-232) metastore.warehouse configuration should use inherited hadoop configuration

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664000#action_12664000 ] 

Joydeep Sen Sarma commented on HIVE-232:
----------------------------------------

hmm - i don't understand this. maybe we need to ping dhruba. fs object has to have scheme auth - otherwise how does it know what file system to talk to (and the fs object has already been instantiated - it's not either a DistributedFileSystem object or a LocalFileSystem based object). 

Is this reproducible (with a populated fs.default.name)?

> metastore.warehouse configuration should use inherited hadoop configuration
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-232
>                 URL: https://issues.apache.org/jira/browse/HIVE-232
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Josh Ferguson
>         Attachments: hive-232.patch
>
>
> the hive.metastore.warehouse.dir configuration property in hive-*.xml needs to use the protocol, host, and port when it is inherited from the fs.name property in hadoop-site.xml.
> When it doesn't and no protocol is found then a broad range of "Move" operations when the source and target are both in the DFS will fail.
> Currently this can be worked around by prepending the protocol, host and port of the hadoop nameserver into the value of the hive.metastore.warehouse.dir property.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-232) metastore.warehouse configuration should use inherited hadoop configuration

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur reassigned HIVE-232:
-------------------------------------

    Assignee: Prasad Chakka

> metastore.warehouse configuration should use inherited hadoop configuration
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-232
>                 URL: https://issues.apache.org/jira/browse/HIVE-232
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Josh Ferguson
>            Assignee: Prasad Chakka
>             Fix For: 0.2.0
>
>         Attachments: hive-232.2.patch, hive-232.2.patch, hive-232.patch
>
>
> the hive.metastore.warehouse.dir configuration property in hive-*.xml needs to use the protocol, host, and port when it is inherited from the fs.name property in hadoop-site.xml.
> When it doesn't and no protocol is found then a broad range of "Move" operations when the source and target are both in the DFS will fail.
> Currently this can be worked around by prepending the protocol, host and port of the hadoop nameserver into the value of the hive.metastore.warehouse.dir property.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-232) metastore.warehouse configuration should use inherited hadoop configuration

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664002#action_12664002 ] 

Joydeep Sen Sarma commented on HIVE-232:
----------------------------------------

it can't default to local file system (that's the whole point of fs.default.name). but perhaps the uri/scheme are populated later on on demand ..

if that is the case (and the current code is not working) - i am ok with the changes.

man - we would be able to regression test this kind of stuff if we had the minimr/dfs stuff setup ..

> metastore.warehouse configuration should use inherited hadoop configuration
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-232
>                 URL: https://issues.apache.org/jira/browse/HIVE-232
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Josh Ferguson
>         Attachments: hive-232.patch
>
>
> the hive.metastore.warehouse.dir configuration property in hive-*.xml needs to use the protocol, host, and port when it is inherited from the fs.name property in hadoop-site.xml.
> When it doesn't and no protocol is found then a broad range of "Move" operations when the source and target are both in the DFS will fail.
> Currently this can be worked around by prepending the protocol, host and port of the hadoop nameserver into the value of the hive.metastore.warehouse.dir property.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-232) metastore.warehouse configuration should use inherited hadoop configuration

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-232:
--------------------------------

    Fix Version/s: 0.3.0
                       (was: 0.6.0)
      Component/s: Configuration

> metastore.warehouse configuration should use inherited hadoop configuration
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-232
>                 URL: https://issues.apache.org/jira/browse/HIVE-232
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Configuration
>            Reporter: Josh Ferguson
>            Assignee: Prasad Chakka
>             Fix For: 0.3.0
>
>         Attachments: hive-232.2.patch, hive-232.2.patch, hive-232.patch
>
>
> the hive.metastore.warehouse.dir configuration property in hive-*.xml needs to use the protocol, host, and port when it is inherited from the fs.name property in hadoop-site.xml.
> When it doesn't and no protocol is found then a broad range of "Move" operations when the source and target are both in the DFS will fail.
> Currently this can be worked around by prepending the protocol, host and port of the hadoop nameserver into the value of the hive.metastore.warehouse.dir property.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-232) metastore.warehouse configuration should use inherited hadoop configuration

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664298#action_12664298 ] 

Joydeep Sen Sarma commented on HIVE-232:
----------------------------------------

+1

do we have a good explanation for what was happening in Josh's case then?

> metastore.warehouse configuration should use inherited hadoop configuration
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-232
>                 URL: https://issues.apache.org/jira/browse/HIVE-232
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Josh Ferguson
>         Attachments: hive-232.2.patch, hive-232.patch
>
>
> the hive.metastore.warehouse.dir configuration property in hive-*.xml needs to use the protocol, host, and port when it is inherited from the fs.name property in hadoop-site.xml.
> When it doesn't and no protocol is found then a broad range of "Move" operations when the source and target are both in the DFS will fail.
> Currently this can be worked around by prepending the protocol, host and port of the hadoop nameserver into the value of the hive.metastore.warehouse.dir property.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-232) metastore.warehouse configuration should use inherited hadoop configuration

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prasad Chakka updated HIVE-232:
-------------------------------

    Attachment: hive-232.2.patch

fusion mount truncated my previous patch. reattaching the file.

> metastore.warehouse configuration should use inherited hadoop configuration
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-232
>                 URL: https://issues.apache.org/jira/browse/HIVE-232
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Josh Ferguson
>         Attachments: hive-232.2.patch, hive-232.2.patch, hive-232.patch
>
>
> the hive.metastore.warehouse.dir configuration property in hive-*.xml needs to use the protocol, host, and port when it is inherited from the fs.name property in hadoop-site.xml.
> When it doesn't and no protocol is found then a broad range of "Move" operations when the source and target are both in the DFS will fail.
> Currently this can be worked around by prepending the protocol, host and port of the hadoop nameserver into the value of the hive.metastore.warehouse.dir property.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-232) metastore.warehouse configuration should use inherited hadoop configuration

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HIVE-232:
----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.2.0
           Status: Resolved  (was: Patch Available)

I just committed this. Thanks Prasad!

> metastore.warehouse configuration should use inherited hadoop configuration
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-232
>                 URL: https://issues.apache.org/jira/browse/HIVE-232
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Josh Ferguson
>            Assignee: Prasad Chakka
>             Fix For: 0.2.0
>
>         Attachments: hive-232.2.patch, hive-232.2.patch, hive-232.patch
>
>
> the hive.metastore.warehouse.dir configuration property in hive-*.xml needs to use the protocol, host, and port when it is inherited from the fs.name property in hadoop-site.xml.
> When it doesn't and no protocol is found then a broad range of "Move" operations when the source and target are both in the DFS will fail.
> Currently this can be worked around by prepending the protocol, host and port of the hadoop nameserver into the value of the hive.metastore.warehouse.dir property.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-232) metastore.warehouse configuration should use inherited hadoop configuration

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prasad Chakka updated HIVE-232:
-------------------------------

    Attachment: hive-232.patch

> metastore.warehouse configuration should use inherited hadoop configuration
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-232
>                 URL: https://issues.apache.org/jira/browse/HIVE-232
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Josh Ferguson
>         Attachments: hive-232.patch
>
>
> the hive.metastore.warehouse.dir configuration property in hive-*.xml needs to use the protocol, host, and port when it is inherited from the fs.name property in hadoop-site.xml.
> When it doesn't and no protocol is found then a broad range of "Move" operations when the source and target are both in the DFS will fail.
> Currently this can be worked around by prepending the protocol, host and port of the hadoop nameserver into the value of the hive.metastore.warehouse.dir property.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-232) metastore.warehouse configuration should use inherited hadoop configuration

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prasad Chakka updated HIVE-232:
-------------------------------

    Attachment: hive-232.2.patch

I think FileSystem.get() does return the configured default scheme and authority. Here is the updated patch with just the change to make hdfs relative paths to work and the metastore.warehouse param to work properly.

> metastore.warehouse configuration should use inherited hadoop configuration
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-232
>                 URL: https://issues.apache.org/jira/browse/HIVE-232
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Josh Ferguson
>         Attachments: hive-232.2.patch, hive-232.patch
>
>
> the hive.metastore.warehouse.dir configuration property in hive-*.xml needs to use the protocol, host, and port when it is inherited from the fs.name property in hadoop-site.xml.
> When it doesn't and no protocol is found then a broad range of "Move" operations when the source and target are both in the DFS will fail.
> Currently this can be worked around by prepending the protocol, host and port of the hadoop nameserver into the value of the hive.metastore.warehouse.dir property.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.