You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/04/15 13:10:00 UTC

[jira] [Work logged] (HIVE-21564) Load data into a bucketed table is ignoring partitions specs and loads data into default partition.

     [ https://issues.apache.org/jira/browse/HIVE-21564?focusedWorklogId=227664&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-227664 ]

ASF GitHub Bot logged work on HIVE-21564:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Apr/19 13:09
            Start Date: 15/Apr/19 13:09
    Worklog Time Spent: 10m 
      Work Description: sankarh commented on pull request #597: HIVE-21564: Load data into a bucketed table is ignoring partitions specs and loads data into default partition.
URL: https://github.com/apache/hive/pull/597
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 227664)
            Time Spent: 10m
    Remaining Estimate: 0h

> Load data into a bucketed table is ignoring partitions specs and loads data into default partition.
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-21564
>                 URL: https://issues.apache.org/jira/browse/HIVE-21564
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>    Affects Versions: 4.0.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-21564.01.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When running below command to load data into bucketed tables it is not loading into specified partition instead loaded into default partition.
> LOAD DATA INPATH '/tmp/files/000000_0' OVERWRITE INTO TABLE call PARTITION(year_partition=2012, month=12);
> SELECT * FROM call WHERE year_partition=2012 AND month=12; --> returns 0 rows.
> {code}
> CREATE TABLE call( 
> date_time_date date, 
> ssn string, 
> name string, 
> location string) 
> PARTITIONED BY ( 
> year_partition int, 
> month int) 
> CLUSTERED BY ( 
> date_time_date) 
> SORTED BY ( 
> date_time_date ASC) 
> INTO 1 BUCKETS 
> STORED AS ORC;
> {code}
> If set hive.exec.dynamic.partition to false, it fails with below error.
> {code}
> Error: Error while compiling statement: FAILED: SemanticException 1:18 Dynamic partition is disabled. Either enable it by setting hive.exec.dynamic.partition=true or specify partition column values. Error encountered near token 'month' (state=42000,code=40000)
> {code}
> When we "set hive.strict.checks.bucketing=false;", the load works fine.
> This is a behaviour imposed by HIVE-15148 to avoid incorrectly named data files being loaded to the bucketed tables. In customer use case, if the files are named properly with bucket_id (00000_0, 00000_1 etc), then it is safe to set this flag to false.
> However, current behaviour of loading into default partitions when hive.strict.checks.bucketing=true and partitions specified, was a bug injected by HIVE-19311 where the given query is re-written into a insert query (to handle incorrect file names and Orc versions) but missed to incorporate the partitions specs to it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)