You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2010/01/22 02:54:54 UTC

[jira] Created: (HIVE-1083) allow sub-directories for a table/partition

allow sub-directories for a table/partition
-------------------------------------------

                 Key: HIVE-1083
                 URL: https://issues.apache.org/jira/browse/HIVE-1083
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Query Processor
            Reporter: Namit Jain
             Fix For: 0.6.0


Subdirectories should be allowed for tables/partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1083) allow sub-directories for an external table/partition

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861920#action_12861920 ] 

John Sichi commented on HIVE-1083:
----------------------------------

Correction:  local file system is probably OK; I just realized that when I tested, I was using the stock hadoop 0.20 version which does not have MAPREDUCE-1501 in it.


> allow sub-directories for an external table/partition
> -----------------------------------------------------
>
>                 Key: HIVE-1083
>                 URL: https://issues.apache.org/jira/browse/HIVE-1083
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: Namit Jain
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>
> Sometimes users want to define an external table/partition based on all files (recursively) inside a directory.
> Currently most of the Hadoop InputFormat classes do not support that. We should extract all files recursively in the directory, and add them to the input path of the job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-1083) allow sub-directories for a table/partition

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao reassigned HIVE-1083:
--------------------------------

    Assignee:     (was: Vikram Chandrasekhar)

> allow sub-directories for a table/partition
> -------------------------------------------
>
>                 Key: HIVE-1083
>                 URL: https://issues.apache.org/jira/browse/HIVE-1083
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>             Fix For: 0.6.0
>
>
> Subdirectories should be allowed for tables/partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1083) allow sub-directories for a table/partition

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803576#action_12803576 ] 

Raghotham Murthy commented on HIVE-1083:
----------------------------------------

The one use case where this will be helpful is when creating external tables on existing directory trees. We currently need to create one partition per lowest level directory. Instead, it would be great if hive allowed creation of a partition on a top-level directory and hive picks up all files within that directory tree.

> allow sub-directories for a table/partition
> -------------------------------------------
>
>                 Key: HIVE-1083
>                 URL: https://issues.apache.org/jira/browse/HIVE-1083
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>             Fix For: 0.6.0
>
>
> Subdirectories should be allowed for tables/partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1083) allow sub-directories for an external table/partition

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-1083:
-----------------------------

    Labels: inputformat  (was: )

Corner cases:
C1. We have 4 external tables: abc_recursive, abc, abc_def_recursive, abc_def
abc_recursive and abc both points to /abc
abc_def and abc_def_recursive both points to /abc/def
abc_recursive and abc_def_recursive have set the bit "recursive".

In ExecDriver, given all tables, we need to find all paths that needs to be added to the input path.
In MapOperator, given the current input path, we need to find all the aliases that the current input path corresponds to.


> allow sub-directories for an external table/partition
> -----------------------------------------------------
>
>                 Key: HIVE-1083
>                 URL: https://issues.apache.org/jira/browse/HIVE-1083
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: Namit Jain
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>
> Sometimes users want to define an external table/partition based on all files (recursively) inside a directory.
> Currently most of the Hadoop InputFormat classes do not support that. We should extract all files recursively in the directory, and add them to the input path of the job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1083) allow sub-directories for an external table/partition

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860450#action_12860450 ] 

John Sichi commented on HIVE-1083:
----------------------------------

Clarification:  you can already get the desired behavior using HDFS, MAPREDUCE-1501, and mapred.input.dir.recursive=true, as long as your query doesn't hit one of the corner cases enuemrated by Zheng.

What remains for this task are the following:

(1) support local file system as well (this failed when I tested it, but I didn't look into why)

(2) deal with HIVE-1133 to fix the corner cases




> allow sub-directories for an external table/partition
> -----------------------------------------------------
>
>                 Key: HIVE-1083
>                 URL: https://issues.apache.org/jira/browse/HIVE-1083
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: Namit Jain
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>
> Sometimes users want to define an external table/partition based on all files (recursively) inside a directory.
> Currently most of the Hadoop InputFormat classes do not support that. We should extract all files recursively in the directory, and add them to the input path of the job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-1083) allow sub-directories for a table/partition

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain reassigned HIVE-1083:
--------------------------------

    Assignee: Vikram Chandrasekhar

> allow sub-directories for a table/partition
> -------------------------------------------
>
>                 Key: HIVE-1083
>                 URL: https://issues.apache.org/jira/browse/HIVE-1083
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Vikram Chandrasekhar
>             Fix For: 0.6.0
>
>
> Subdirectories should be allowed for tables/partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1083) allow sub-directories for an external table/partition

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-1083:
-----------------------------

          Description: 
Sometimes users want to define an external table/partition based on all files (recursively) inside a directory.

Currently most of the Hadoop InputFormat classes do not support that. We should extract all files recursively in the directory, and add them to the input path of the job.



  was:Subdirectories should be allowed for tables/partitions.

    Affects Version/s: 0.6.0
             Assignee: Zheng Shao
              Summary: allow sub-directories for an external table/partition  (was: allow sub-directories for a table/partition)

> allow sub-directories for an external table/partition
> -----------------------------------------------------
>
>                 Key: HIVE-1083
>                 URL: https://issues.apache.org/jira/browse/HIVE-1083
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: Namit Jain
>            Assignee: Zheng Shao
>             Fix For: 0.6.0
>
>
> Sometimes users want to define an external table/partition based on all files (recursively) inside a directory.
> Currently most of the Hadoop InputFormat classes do not support that. We should extract all files recursively in the directory, and add them to the input path of the job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.