You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Franklin Hu (JIRA)" <ji...@apache.org> on 2011/07/21 01:13:58 UTC

[jira] [Created] (HIVE-2296) bad compressed file names from insert into

bad compressed file names from insert into
------------------------------------------

                 Key: HIVE-2296
                 URL: https://issues.apache.org/jira/browse/HIVE-2296
             Project: Hive
          Issue Type: Bug
            Reporter: Franklin Hu
            Assignee: Franklin Hu


When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:

Before INSERT INTO:
000000_0.gz

After INSERT INTO:
000000_0
000000_0.gz_copy_1

Correct behavior should be to pick a valid filename

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2296) bad compressed file names from insert into

Posted by "Franklin Hu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Franklin Hu updated HIVE-2296:
------------------------------

    Affects Version/s: 0.8.0

> bad compressed file names from insert into
> ------------------------------------------
>
>                 Key: HIVE-2296
>                 URL: https://issues.apache.org/jira/browse/HIVE-2296
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Franklin Hu
>            Assignee: Franklin Hu
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0
> 000000_0.gz_copy_1
> Correct behavior should be to pick a valid filename

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2296) bad compressed file names from insert into

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068794#comment-13068794 ] 

jiraposter@reviews.apache.org commented on HIVE-2296:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1155/
-----------------------------------------------------------

Review request for hive and Siying Dong.


Summary
-------

Fixes problem of bad compressed file names by stripping off the file format (ex ".gz") and reappending it to the path later.


This addresses bug HIVE-2296.
    https://issues.apache.org/jira/browse/HIVE-2296


Diffs
-----

  trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1148973 
  trunk/ql/src/test/queries/clientpositive/insert_compressed.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/insert_compressed.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/1155/diff


Testing
-------

Unit tests pass


Thanks,

Franklin



> bad compressed file names from insert into
> ------------------------------------------
>
>                 Key: HIVE-2296
>                 URL: https://issues.apache.org/jira/browse/HIVE-2296
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Franklin Hu
>            Assignee: Franklin Hu
>         Attachments: hive-2296.1.patch, hive-2296.2.patch
>
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> This causes corrupted output when doing a SELECT * on the table.
> Correct behavior should be to pick a valid filename such as:
> 000000_0_copy_1.gz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2296) bad compressed file names from insert into

Posted by "Franklin Hu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Franklin Hu updated HIVE-2296:
------------------------------

    Attachment: hive-2296.1.patch

> bad compressed file names from insert into
> ------------------------------------------
>
>                 Key: HIVE-2296
>                 URL: https://issues.apache.org/jira/browse/HIVE-2296
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Franklin Hu
>            Assignee: Franklin Hu
>         Attachments: hive-2296.1.patch
>
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> Correct behavior should be to pick a valid filename

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2296) bad compressed file names from insert into

Posted by "Franklin Hu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Franklin Hu updated HIVE-2296:
------------------------------

    Description: 
When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:

Before INSERT INTO:
000000_0.gz

After INSERT INTO:
000000_0.gz
000000_0.gz_copy_1

Correct behavior should be to pick a valid filename

  was:
When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:

Before INSERT INTO:
000000_0.gz

After INSERT INTO:
000000_0
000000_0.gz_copy_1

Correct behavior should be to pick a valid filename


> bad compressed file names from insert into
> ------------------------------------------
>
>                 Key: HIVE-2296
>                 URL: https://issues.apache.org/jira/browse/HIVE-2296
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Franklin Hu
>            Assignee: Franklin Hu
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> Correct behavior should be to pick a valid filename

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Work stopped] (HIVE-2296) bad compressed file names from insert into

Posted by "Franklin Hu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HIVE-2296 stopped by Franklin Hu.

> bad compressed file names from insert into
> ------------------------------------------
>
>                 Key: HIVE-2296
>                 URL: https://issues.apache.org/jira/browse/HIVE-2296
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Franklin Hu
>            Assignee: Franklin Hu
>             Fix For: 0.8.0
>
>         Attachments: hive-2296.1.patch, hive-2296.2.patch
>
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> This causes corrupted output when doing a SELECT * on the table.
> Correct behavior should be to pick a valid filename such as:
> 000000_0_copy_1.gz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2296) bad compressed file names from insert into

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069314#comment-13069314 ] 

Siying Dong commented on HIVE-2296:
-----------------------------------

+1

> bad compressed file names from insert into
> ------------------------------------------
>
>                 Key: HIVE-2296
>                 URL: https://issues.apache.org/jira/browse/HIVE-2296
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Franklin Hu
>            Assignee: Franklin Hu
>             Fix For: 0.8.0
>
>         Attachments: hive-2296.1.patch, hive-2296.2.patch
>
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> This causes corrupted output when doing a SELECT * on the table.
> Correct behavior should be to pick a valid filename such as:
> 000000_0_copy_1.gz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2296) bad compressed file names from insert into

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-2296:
------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

committed. Thanks Franklin!

> bad compressed file names from insert into
> ------------------------------------------
>
>                 Key: HIVE-2296
>                 URL: https://issues.apache.org/jira/browse/HIVE-2296
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Franklin Hu
>            Assignee: Franklin Hu
>             Fix For: 0.8.0
>
>         Attachments: hive-2296.1.patch, hive-2296.2.patch
>
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> This causes corrupted output when doing a SELECT * on the table.
> Correct behavior should be to pick a valid filename such as:
> 000000_0_copy_1.gz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Work started] (HIVE-2296) bad compressed file names from insert into

Posted by "Franklin Hu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HIVE-2296 started by Franklin Hu.

> bad compressed file names from insert into
> ------------------------------------------
>
>                 Key: HIVE-2296
>                 URL: https://issues.apache.org/jira/browse/HIVE-2296
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Franklin Hu
>            Assignee: Franklin Hu
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> Correct behavior should be to pick a valid filename

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2296) bad compressed file names from insert into

Posted by "Franklin Hu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Franklin Hu updated HIVE-2296:
------------------------------

    Description: 
When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:

Before INSERT INTO:
000000_0.gz

After INSERT INTO:
000000_0.gz
000000_0.gz_copy_1

This causes corrupted output when doing a SELECT * on the table.
Correct behavior should be to pick a valid filename such as:
000000_0_copy_1.gz

  was:
When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:

Before INSERT INTO:
000000_0.gz

After INSERT INTO:
000000_0.gz
000000_0.gz_copy_1

Correct behavior should be to pick a valid filename


> bad compressed file names from insert into
> ------------------------------------------
>
>                 Key: HIVE-2296
>                 URL: https://issues.apache.org/jira/browse/HIVE-2296
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Franklin Hu
>            Assignee: Franklin Hu
>         Attachments: hive-2296.1.patch
>
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> This causes corrupted output when doing a SELECT * on the table.
> Correct behavior should be to pick a valid filename such as:
> 000000_0_copy_1.gz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2296) bad compressed file names from insert into

Posted by "Franklin Hu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Franklin Hu updated HIVE-2296:
------------------------------

    Fix Version/s: 0.8.0
           Status: Patch Available  (was: Open)

> bad compressed file names from insert into
> ------------------------------------------
>
>                 Key: HIVE-2296
>                 URL: https://issues.apache.org/jira/browse/HIVE-2296
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Franklin Hu
>            Assignee: Franklin Hu
>             Fix For: 0.8.0
>
>         Attachments: hive-2296.1.patch, hive-2296.2.patch
>
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> This causes corrupted output when doing a SELECT * on the table.
> Correct behavior should be to pick a valid filename such as:
> 000000_0_copy_1.gz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2296) bad compressed file names from insert into

Posted by "Franklin Hu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Franklin Hu updated HIVE-2296:
------------------------------

    Attachment: hive-2296.2.patch

add unit test

> bad compressed file names from insert into
> ------------------------------------------
>
>                 Key: HIVE-2296
>                 URL: https://issues.apache.org/jira/browse/HIVE-2296
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Franklin Hu
>            Assignee: Franklin Hu
>         Attachments: hive-2296.1.patch, hive-2296.2.patch
>
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> This causes corrupted output when doing a SELECT * on the table.
> Correct behavior should be to pick a valid filename such as:
> 000000_0_copy_1.gz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2296) bad compressed file names from insert into

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069885#comment-13069885 ] 

Hudson commented on HIVE-2296:
------------------------------

Integrated in Hive-trunk-h0.21 #843 (See [https://builds.apache.org/job/Hive-trunk-h0.21/843/])
    HIVE-2296. bad compressed file names from insert into (Franklin Hu via Siying Dong)

sdong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1149724
Files : 
* /hive/trunk/ql/src/test/queries/clientpositive/insert_compressed.q
* /hive/trunk/ql/src/test/results/clientpositive/insert_compressed.q.out
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java


> bad compressed file names from insert into
> ------------------------------------------
>
>                 Key: HIVE-2296
>                 URL: https://issues.apache.org/jira/browse/HIVE-2296
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Franklin Hu
>            Assignee: Franklin Hu
>             Fix For: 0.8.0
>
>         Attachments: hive-2296.1.patch, hive-2296.2.patch
>
>
> When INSERT INTO is run on a table with compressed output (hive.exec.compress.output=true) and existing files in the table, it may copy the new files in bad file names:
> Before INSERT INTO:
> 000000_0.gz
> After INSERT INTO:
> 000000_0.gz
> 000000_0.gz_copy_1
> This causes corrupted output when doing a SELECT * on the table.
> Correct behavior should be to pick a valid filename such as:
> 000000_0_copy_1.gz

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira