You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Steven Wong (JIRA)" <ji...@apache.org> on 2011/05/09 20:06:03 UTC

[jira] [Commented] (HIVE-2087) Dynamic partition insert performance problem

    [ https://issues.apache.org/jira/browse/HIVE-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030836#comment-13030836 ] 

Steven Wong commented on HIVE-2087:
-----------------------------------

This problem seems to happen only when there is no static partition column.

> Dynamic partition insert performance problem
> --------------------------------------------
>
>                 Key: HIVE-2087
>                 URL: https://issues.apache.org/jira/browse/HIVE-2087
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 0.7.0
>         Environment: Amazon EMR, S3
>            Reporter: Q Long
>
> Create an external(backed by S3) table T, make it partitioned by column P. Populate table T so it has large number of partitions (say 100). Execute statement like
> insert overwrite table T partition (p) select * from another_table
> check hive server log, and it will show that all existing partitions will be read and loaded before any mapper starts working. This feels excessive, given that the insert statement may only create or overwrite a very small number of partitions. Is there other reason that insert using dynamic partition requires loading the whole table?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira