You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Oleksiy Sayankin (JIRA)" <ji...@apache.org> on 2018/02/13 15:52:00 UTC
[jira] [Comment Edited] (HIVE-18702) INSERT OVERWRITE TABLE doesn't
clean the table directory before overwriting
[ https://issues.apache.org/jira/browse/HIVE-18702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16362512#comment-16362512 ]
Oleksiy Sayankin edited comment on HIVE-18702 at 2/13/18 3:51 PM:
------------------------------------------------------------------
*FIXED*
*ROOT-CAUSE*
This {{if}} statement does not work
{code}
FileStatus[] statuses = HiveStatsUtils.getFileStatusRecurse(
tmpPath, ((dpCtx == null) ? 1 : dpCtx.getNumDPCols()), fs);
if(statuses != null && statuses.length > 0) {
{code}
when there are no files in {{/bug/.hive-staging_hive_2018-02-13_14-14-39_529_3325659916929491937-1/_task_tmp.-ext-10000}}. Thus folder {{-ext-10000}} is not created. After that this section of code
{code}
protected void replaceFiles(Path tablePath, Path srcf, Path destf, Path oldPath, HiveConf conf,
boolean isSrcLocal, boolean purge) throws HiveException {
try {
FileSystem destFs = destf.getFileSystem(conf);
// check if srcf contains nested sub-directories
FileStatus[] srcs;
FileSystem srcFs;
try {
srcFs = srcf.getFileSystem(conf);
srcs = srcFs.globStatus(srcf);
} catch (IOException e) {
throw new HiveException("Getting globStatus " + srcf.toString(), e);
}
if (srcs == null) {
LOG.info("No sources specified to move: " + srcf);
return;
}
{code}
returns {{LOG.info("No sources specified to move: " + srcf);}} and existing values in the table are not overwritten.
*SOLUTION*
Use {{fs.exists(tmpPath)}} instead of {{FileStatus[] statuses}}.
was (Author: osayankin):
*FIXED*
*ROOT-CAUSE*
This if statement does not work
{code}
FileStatus[] statuses = HiveStatsUtils.getFileStatusRecurse(
tmpPath, ((dpCtx == null) ? 1 : dpCtx.getNumDPCols()), fs);
if(statuses != null && statuses.length > 0) {
{code}
when there are no files in {{/bug/.hive-staging_hive_2018-02-13_14-14-39_529_3325659916929491937-1/_task_tmp.-ext-10000}}. Thus folder {{-ext-10000}} is not created. After that this section of code
{code}
protected void replaceFiles(Path tablePath, Path srcf, Path destf, Path oldPath, HiveConf conf,
boolean isSrcLocal, boolean purge) throws HiveException {
try {
FileSystem destFs = destf.getFileSystem(conf);
// check if srcf contains nested sub-directories
FileStatus[] srcs;
FileSystem srcFs;
try {
srcFs = srcf.getFileSystem(conf);
srcs = srcFs.globStatus(srcf);
} catch (IOException e) {
throw new HiveException("Getting globStatus " + srcf.toString(), e);
}
if (srcs == null) {
LOG.info("No sources specified to move: " + srcf);
return;
}
{code}
returns {{LOG.info("No sources specified to move: " + srcf);}} and existing values in the table are not overwritten.
*SOLUTION*
Use {{fs.exists(tmpPath)}} instead of {{FileStatus[] statuses}}.
> INSERT OVERWRITE TABLE doesn't clean the table directory before overwriting
> ---------------------------------------------------------------------------
>
> Key: HIVE-18702
> URL: https://issues.apache.org/jira/browse/HIVE-18702
> Project: Hive
> Issue Type: Bug
> Affects Versions: 2.3.2
> Reporter: Oleksiy Sayankin
> Assignee: Oleksiy Sayankin
> Priority: Major
> Fix For: 3.0.0, 2.3.3
>
> Attachments: HIVE-18702.1.patch
>
>
> Enable Hive on TEZ. (MR works fine).
> *STEP 1. Create test data*
> {code}
> nano /home/test/users.txt
> {code}
> Add to file:
> {code}
> Peter,34
> John,25
> Mary,28
> {code}
> {code}
> hadoop fs -mkdir /bug
> hadoop fs -copyFromLocal /home/test/users.txt /bug
> hadoop fs -ls /bug
> {code}
> *EXPECTED RESULT:*
> {code}
> Found 2 items
> -rwxr-xr-x 3 root root 25 2015-10-15 16:11 /bug/users.txt
> {code}
> *STEP 2. Upload data to hive*
> {code}
> create external table bug(name string, age int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' LOCATION '/bug';
> select * from bug;
> {code}
> *EXPECTED RESULT:*
> {code}
> OK
> Peter 34
> John 25
> Mary 28
> {code}
> {code}
> create external table bug1(name string, age int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' LOCATION '/bug1';
> insert overwrite table bug select * from bug1;
> select * from bug;
> {code}
> *EXPECTED RESULT:*
> {code}
> OK
> Time taken: 0.097 seconds
> {code}
> *ACTUAL RESULT:*
> {code}
> hive> select * from bug;
> OK
> Peter 34
> John 25
> Mary 28
> Time taken: 0.198 seconds, Fetched: 3 row(s)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)