You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Franklin Hu (JIRA)" <ji...@apache.org> on 2011/07/21 01:13:58 UTC
[jira] [Created] (HIVE-2296) bad compressed file names from insert
into
bad compressed file names from insert into
------------------------------------------
Key: HIVE-2296
URL: https://issues.apache.org/jira/browse/HIVE-2296
Project: Hive
Issue Type: Bug
Reporter: Franklin Hu
Assignee: Franklin Hu
When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
Before INSERT INTO:
000000_0.gz
After INSERT INTO:
000000_0
000000_0.gz_copy_1
Correct behavior should be to pick a valid filename
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2296) bad compressed file names from insert
into
Posted by "Franklin Hu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Franklin Hu updated HIVE-2296:
------------------------------
Affects Version/s: 0.8.0
> bad compressed file names from insert into
> ------------------------------------------
>
> Key: HIVE-2296
> URL: https://issues.apache.org/jira/browse/HIVE-2296
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Franklin Hu
> Assignee: Franklin Hu
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0
> 000000_0.gz_copy_1
> Correct behavior should be to pick a valid filename
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2296) bad compressed file names from
insert into
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068794#comment-13068794 ]
jiraposter@reviews.apache.org commented on HIVE-2296:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1155/
-----------------------------------------------------------
Review request for hive and Siying Dong.
Summary
-------
Fixes problem of bad compressed file names by stripping off the file format (ex ".gz") and reappending it to the path later.
This addresses bug HIVE-2296.
https://issues.apache.org/jira/browse/HIVE-2296
Diffs
-----
trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1148973
trunk/ql/src/test/queries/clientpositive/insert_compressed.q PRE-CREATION
trunk/ql/src/test/results/clientpositive/insert_compressed.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/1155/diff
Testing
-------
Unit tests pass
Thanks,
Franklin
> bad compressed file names from insert into
> ------------------------------------------
>
> Key: HIVE-2296
> URL: https://issues.apache.org/jira/browse/HIVE-2296
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Franklin Hu
> Assignee: Franklin Hu
> Attachments: hive-2296.1.patch, hive-2296.2.patch
>
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> This causes corrupted output when doing a SELECT * on the table.
> Correct behavior should be to pick a valid filename such as:
> 000000_0_copy_1.gz
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2296) bad compressed file names from insert
into
Posted by "Franklin Hu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Franklin Hu updated HIVE-2296:
------------------------------
Attachment: hive-2296.1.patch
> bad compressed file names from insert into
> ------------------------------------------
>
> Key: HIVE-2296
> URL: https://issues.apache.org/jira/browse/HIVE-2296
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Franklin Hu
> Assignee: Franklin Hu
> Attachments: hive-2296.1.patch
>
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> Correct behavior should be to pick a valid filename
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2296) bad compressed file names from insert
into
Posted by "Franklin Hu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Franklin Hu updated HIVE-2296:
------------------------------
Description:
When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
Before INSERT INTO:
000000_0.gz
After INSERT INTO:
000000_0.gz
000000_0.gz_copy_1
Correct behavior should be to pick a valid filename
was:
When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
Before INSERT INTO:
000000_0.gz
After INSERT INTO:
000000_0
000000_0.gz_copy_1
Correct behavior should be to pick a valid filename
> bad compressed file names from insert into
> ------------------------------------------
>
> Key: HIVE-2296
> URL: https://issues.apache.org/jira/browse/HIVE-2296
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Franklin Hu
> Assignee: Franklin Hu
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> Correct behavior should be to pick a valid filename
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work stopped] (HIVE-2296) bad compressed file names from
insert into
Posted by "Franklin Hu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on HIVE-2296 stopped by Franklin Hu.
> bad compressed file names from insert into
> ------------------------------------------
>
> Key: HIVE-2296
> URL: https://issues.apache.org/jira/browse/HIVE-2296
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Franklin Hu
> Assignee: Franklin Hu
> Fix For: 0.8.0
>
> Attachments: hive-2296.1.patch, hive-2296.2.patch
>
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> This causes corrupted output when doing a SELECT * on the table.
> Correct behavior should be to pick a valid filename such as:
> 000000_0_copy_1.gz
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2296) bad compressed file names from
insert into
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069314#comment-13069314 ]
Siying Dong commented on HIVE-2296:
-----------------------------------
+1
> bad compressed file names from insert into
> ------------------------------------------
>
> Key: HIVE-2296
> URL: https://issues.apache.org/jira/browse/HIVE-2296
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Franklin Hu
> Assignee: Franklin Hu
> Fix For: 0.8.0
>
> Attachments: hive-2296.1.patch, hive-2296.2.patch
>
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> This causes corrupted output when doing a SELECT * on the table.
> Correct behavior should be to pick a valid filename such as:
> 000000_0_copy_1.gz
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2296) bad compressed file names from insert
into
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siying Dong updated HIVE-2296:
------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
committed. Thanks Franklin!
> bad compressed file names from insert into
> ------------------------------------------
>
> Key: HIVE-2296
> URL: https://issues.apache.org/jira/browse/HIVE-2296
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Franklin Hu
> Assignee: Franklin Hu
> Fix For: 0.8.0
>
> Attachments: hive-2296.1.patch, hive-2296.2.patch
>
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> This causes corrupted output when doing a SELECT * on the table.
> Correct behavior should be to pick a valid filename such as:
> 000000_0_copy_1.gz
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (HIVE-2296) bad compressed file names from
insert into
Posted by "Franklin Hu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on HIVE-2296 started by Franklin Hu.
> bad compressed file names from insert into
> ------------------------------------------
>
> Key: HIVE-2296
> URL: https://issues.apache.org/jira/browse/HIVE-2296
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Franklin Hu
> Assignee: Franklin Hu
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> Correct behavior should be to pick a valid filename
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2296) bad compressed file names from insert
into
Posted by "Franklin Hu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Franklin Hu updated HIVE-2296:
------------------------------
Description:
When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
Before INSERT INTO:
000000_0.gz
After INSERT INTO:
000000_0.gz
000000_0.gz_copy_1
This causes corrupted output when doing a SELECT * on the table.
Correct behavior should be to pick a valid filename such as:
000000_0_copy_1.gz
was:
When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
Before INSERT INTO:
000000_0.gz
After INSERT INTO:
000000_0.gz
000000_0.gz_copy_1
Correct behavior should be to pick a valid filename
> bad compressed file names from insert into
> ------------------------------------------
>
> Key: HIVE-2296
> URL: https://issues.apache.org/jira/browse/HIVE-2296
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Franklin Hu
> Assignee: Franklin Hu
> Attachments: hive-2296.1.patch
>
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> This causes corrupted output when doing a SELECT * on the table.
> Correct behavior should be to pick a valid filename such as:
> 000000_0_copy_1.gz
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2296) bad compressed file names from insert
into
Posted by "Franklin Hu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Franklin Hu updated HIVE-2296:
------------------------------
Fix Version/s: 0.8.0
Status: Patch Available (was: Open)
> bad compressed file names from insert into
> ------------------------------------------
>
> Key: HIVE-2296
> URL: https://issues.apache.org/jira/browse/HIVE-2296
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Franklin Hu
> Assignee: Franklin Hu
> Fix For: 0.8.0
>
> Attachments: hive-2296.1.patch, hive-2296.2.patch
>
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> This causes corrupted output when doing a SELECT * on the table.
> Correct behavior should be to pick a valid filename such as:
> 000000_0_copy_1.gz
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2296) bad compressed file names from insert
into
Posted by "Franklin Hu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Franklin Hu updated HIVE-2296:
------------------------------
Attachment: hive-2296.2.patch
add unit test
> bad compressed file names from insert into
> ------------------------------------------
>
> Key: HIVE-2296
> URL: https://issues.apache.org/jira/browse/HIVE-2296
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Franklin Hu
> Assignee: Franklin Hu
> Attachments: hive-2296.1.patch, hive-2296.2.patch
>
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> This causes corrupted output when doing a SELECT * on the table.
> Correct behavior should be to pick a valid filename such as:
> 000000_0_copy_1.gz
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2296) bad compressed file names from
insert into
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069885#comment-13069885 ]
Hudson commented on HIVE-2296:
------------------------------
Integrated in Hive-trunk-h0.21 #843 (See [https://builds.apache.org/job/Hive-trunk-h0.21/843/])
HIVE-2296. bad compressed file names from insert into (Franklin Hu via Siying Dong)
sdong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1149724
Files :
* /hive/trunk/ql/src/test/queries/clientpositive/insert_compressed.q
* /hive/trunk/ql/src/test/results/clientpositive/insert_compressed.q.out
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> bad compressed file names from insert into
> ------------------------------------------
>
> Key: HIVE-2296
> URL: https://issues.apache.org/jira/browse/HIVE-2296
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Franklin Hu
> Assignee: Franklin Hu
> Fix For: 0.8.0
>
> Attachments: hive-2296.1.patch, hive-2296.2.patch
>
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> This causes corrupted output when doing a SELECT * on the table.
> Correct behavior should be to pick a valid filename such as:
> 000000_0_copy_1.gz
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira