You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Sushanth Sowmyan (JIRA)" <ji...@apache.org> on 2013/08/06 22:25:47 UTC

[jira] [Commented] (HIVE-5011) Dynamic partitioning in HCatalog broken on external tables

    [ https://issues.apache.org/jira/browse/HIVE-5011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13731226#comment-13731226 ] 

Sushanth Sowmyan commented on HIVE-5011:
----------------------------------------

Basic bug synopsis:

Say a table t1 partitioned by key dayofweek:string is present in location "hdfs://blah/foo/t1/".

Ordinarily, if we try to write to it specifying that we're writing a partition dayofweek="sunday", then the location it'll write to is "hdfs://blah/foo/t1/dayofweek=sunday/".

Now, this is known before the MR jobs start, and will be set as the location, and all is good. If the table is specified as an external table, and the user wants to specify a custom location format for the location, such that they want "hdfs://blah/foo/t1/sunday/", then HCat Storer currently allows them to specify that, and that will be honoured too.

That was the intent of HCATALOG-500, and the way it works for static partitioning.

With dynamic partitioning on external tables, with HCATALOG-500, however, this is what winds up happening.

All the partitions being written to wind up having their location set as "hdfs://blah/foo/t1/dayofweek=__DEFAULT_HIVE_PARTITION__" if no override is provided , or to "hdfs://blah/foo/t1/whatever" if that location was provided as an override.

This results in the first partition writes from the drones writing to this location, and all other drones not being able to open to write, stalling, getting retried, and having the job fail. It would be possible, in theory, if there were only one reducer in the job, and all data present in only one partition worth of writing, that the job might not fail, but that's a highly constrained mode of writing which makes the dynamic partitioning feature itself meaningless.
                
> Dynamic partitioning in HCatalog broken on external tables
> ----------------------------------------------------------
>
>                 Key: HIVE-5011
>                 URL: https://issues.apache.org/jira/browse/HIVE-5011
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>            Reporter: Sushanth Sowmyan
>            Assignee: Sushanth Sowmyan
>
> Dynamic partitioning with HCatalog has been broken as a result of HCATALOG-500 trying to support user-set paths for external tables.
> The goal there was to be able to support other custom destinations apart from the normal "hive-style" partitions. However, it is not currently possible for users to set paths for dynamic ptn writes, since we don't support any way for users to specify "patterns"(like, say "$\{rootdir\}/$v1.$v2/") into which writes happen, only "locations", and the values for dyn. partitions are not known ahead of time. Also, specifying a custom path messes with the way dynamic ptn. code tries to determine what was written to where from the output committer, which means that even if we supported patterned-writes instead of location-writes, we still have to do some more deep diving into the output committer code to support it.
> Thus, my current proposal is that we honour writes to user-specified paths for external tables *ONLY* for static partition writes - i.e., if we can determine that the write is a dyn. ptn. write, we will ignore the user specification. (Note that this does not mean we ignore the table's external location - we honour that - we just don't honour any HCatStorer/etc provided additional location - we stick to what metadata tells us the root location is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira