You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Manpreet Singh (JIRA)" <ji...@apache.org> on 2018/06/26 04:38:00 UTC

[jira] [Commented] (HIVE-19991) msck repair table command not able to retrieve achieved data.

    [ https://issues.apache.org/jira/browse/HIVE-19991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523172#comment-16523172 ] 

Manpreet Singh commented on HIVE-19991:
---------------------------------------

This can be worked around by setting the location or using "alter table .. add partition .. location" instead of "msck".

 

 

> msck repair table command not able to retrieve achieved data.
> -------------------------------------------------------------
>
>                 Key: HIVE-19991
>                 URL: https://issues.apache.org/jira/browse/HIVE-19991
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>            Reporter: Manpreet Singh
>            Priority: Major
>
>  Observed an issue when customer used msck repair tablename command on a archeived data (data copied via hadoop fs -cp from another location) by the command did not loaded the partitions in the table and showed zero results.
>  
> Please find below complete steps for both archived and unarchived.
>  Copied table's partition data to another location & define a similar table structure for new location an ran "msck repair table" command on the destination and then ran a select * from <destination> table it produced zero results, same thing is working for non achieved data.
>  
> 1. Created two table with similar structure and different loctation.
> create table sau_test1 ( a int, b string) partitioned by (dt string) stored as parquet location '/user/hive/warehouse/sau_test1';
> create table sau_arch ( a int, b string) partitioned by (dt string) stored as parquet location'/user/hive/warehouse/sau_arch';
> 2. Inserted data in source table
> insert into sau_test1 partition(dt='dt1') select 1,'A1';
> select * from sau_test1;
> +---------------+-------------++-----------------+
> |sau_test1.a|sau_test1.b|sau_test1.dt|
> +---------------+-------------++-----------------+
> |1|A1|dt1|
> +---------------+-------------++-----------------+
> 3. Copied the content of directory source directory to destination directory.
> hadoop fs -cp /user/hive/warehouse/sau_test1/* /user/hive/warehouse/sau_arch/
> 4. Running msck repair table <destination table> and checking results. — for unachieved data
> msck repair table sau_arch ;
> select * from sau_arch;
> select * from sau_arch;
> +--------------+------------++----------------+
> |sau_arch.a|sau_arch.b|sau_arch.dt|
> +--------------+------------++----------------+
> |1|A1|dt1|
> +--------------+------------++----------------+
> 5. Customer wants the same functionality for archived data and hence tried below steps.
> a) Dropped table partition in destination table "alter table sau_arch drop partition(dt='dt1');"
> b) set hive.archive.enabled=true;
> alter table sau_test1 archive partition ( dt='dt1');
> c) copied the hdfs files from source table to destination tables.
> hdfs dfs -ls /user/hive/warehouse/sau_test1/dt=dt1/
> drwxr-xr-x - hive supergroup 0 2018-06-08 13:26 /user/hive/warehouse/sau_test1/dt=dt1/data.har
> -rw-r--r-- 3 hive supergroup 0 2018-06-08 13:26 /user/hive/warehouse/sau_test1/dt=dt1/data.har/_SUCCESS
> -rw-r--r-- 3 hive supergroup 305 2018-06-08 13:26 /user/hive/warehouse/sau_test1/dt=dt1/data.har/_index
> -rw-r--r-- 3 hive supergroup 23 2018-06-08 13:26 /user/hive/warehouse/sau_test1/dt=dt1/data.har/_masterindex
> -rw-r--r-- 3 hive supergroup 286 2018-06-08 13:26 /user/hive/warehouse/sau_test1/dt=dt1/data.har/part-0
> $ hdfs dfs -ls /user/hive/warehouse/sau_arch/dt=dt1/
> drwxr-xr-x - ngdb supergroup 0 2018-06-08 13:27 /user/hive/warehouse/sau_arch/dt=dt1/data.har
> -rw-r--r-- 3 ngdb supergroup 0 2018-06-08 13:27 /user/hive/warehouse/sau_arch/dt=dt1/data.har/_SUCCESS
> -rw-r--r-- 3 ngdb supergroup 305 2018-06-08 13:27 /user/hive/warehouse/sau_arch/dt=dt1/data.har/_index
> -rw-r--r-- 3 ngdb supergroup 23 2018-06-08 13:27 /user/hive/warehouse/sau_arch/dt=dt1/data.har/_masterindex
> -rw-r--r-- 3 ngdb supergroup 286 2018-06-08 13:27 /user/hive/warehouse/sau_arch/dt=dt1/data.har/part-0
> d) msck repair table sau_arch;
> e)select * from sau_arch . - No results shown
> +--------------+------------++----------------+
> |sau_arch.a|sau_arch.b|sau_arch.dt|
> +--------------+------------++----------------+
> +--------------+------------++----------------+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)