You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Krishna Kumar (JIRA)" <ji...@apache.org> on 2011/08/29 12:23:37 UTC

[jira] [Created] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly

Merging of compressed rcfiles fails to write the valuebuffer part correctly
---------------------------------------------------------------------------

                 Key: HIVE-2417
                 URL: https://issues.apache.org/jira/browse/HIVE-2417
             Project: Hive
          Issue Type: Bug
          Components: Query Processor
            Reporter: Krishna Kumar
            Assignee: Krishna Kumar
         Attachments: HIVE-2417.v0.patch

The blockmerge task does not create proper rc files when merging compressed rc files as the valuebuffer writing is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094963#comment-13094963 ] 

He Yongqiang commented on HIVE-2417:
------------------------------------

+1, will commit after tests pass

> Merging of compressed rcfiles fails to write the valuebuffer part correctly
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-2417
>                 URL: https://issues.apache.org/jira/browse/HIVE-2417
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Krishna Kumar
>            Assignee: Krishna Kumar
>         Attachments: HIVE-2417.v0.patch, HIVE-2417.v1.patch
>
>
> The blockmerge task does not create proper rc files when merging compressed rc files as the valuebuffer writing is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095608#comment-13095608 ] 

He Yongqiang commented on HIVE-2417:
------------------------------------

Committed, thanks Krishna Kumar!

> Merging of compressed rcfiles fails to write the valuebuffer part correctly
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-2417
>                 URL: https://issues.apache.org/jira/browse/HIVE-2417
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Krishna Kumar
>            Assignee: Krishna Kumar
>         Attachments: HIVE-2417.v0.patch, HIVE-2417.v1.patch
>
>
> The blockmerge task does not create proper rc files when merging compressed rc files as the valuebuffer writing is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094287#comment-13094287 ] 

He Yongqiang commented on HIVE-2417:
------------------------------------

Good catch, this is a regression introduced in HIVE-2396.
Can you make the testcase more easy to reproduce the problem? I mean if without the change in this diff, should get an error or incorrect results when running with that testcase. 

1. remove this "+set mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec;",
2. tgt_rc_merge_test only contains one file, so the 'alter table tgt_rc_merge_test concatenate;' will basically do nothing. Can you make sure this table at least contains 2 files? You can upload 2 gzip compressed rcfile if there is not.




> Merging of compressed rcfiles fails to write the valuebuffer part correctly
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-2417
>                 URL: https://issues.apache.org/jira/browse/HIVE-2417
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Krishna Kumar
>            Assignee: Krishna Kumar
>         Attachments: HIVE-2417.v0.patch
>
>
> The blockmerge task does not create proper rc files when merging compressed rc files as the valuebuffer writing is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly

Posted by "Krishna Kumar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Krishna Kumar updated HIVE-2417:
--------------------------------

    Attachment: HIVE-2417.v0.patch

Test added

> Merging of compressed rcfiles fails to write the valuebuffer part correctly
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-2417
>                 URL: https://issues.apache.org/jira/browse/HIVE-2417
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Krishna Kumar
>            Assignee: Krishna Kumar
>         Attachments: HIVE-2417.v0.patch
>
>
> The blockmerge task does not create proper rc files when merging compressed rc files as the valuebuffer writing is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095920#comment-13095920 ] 

Hudson commented on HIVE-2417:
------------------------------

Integrated in Hive-trunk-h0.21 #928 (See [https://builds.apache.org/job/Hive-trunk-h0.21/928/])
    HIVE-2417: Merging of compressed rcfiles fails to write the valuebuffer part correctly (Krishna Kumar via He Yongqiang)

heyongqiang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1164278
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
* /hive/trunk/ql/src/test/queries/clientpositive/create_merge_compressed.q
* /hive/trunk/ql/src/test/results/clientpositive/create_merge_compressed.q.out


> Merging of compressed rcfiles fails to write the valuebuffer part correctly
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-2417
>                 URL: https://issues.apache.org/jira/browse/HIVE-2417
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Krishna Kumar
>            Assignee: Krishna Kumar
>         Attachments: HIVE-2417.v0.patch, HIVE-2417.v1.patch
>
>
> The blockmerge task does not create proper rc files when merging compressed rc files as the valuebuffer writing is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly

Posted by "Krishna Kumar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094294#comment-13094294 ] 

Krishna Kumar commented on HIVE-2417:
-------------------------------------

Yes, the test is designed to produce the error when run without the change. Are you finding that that's not the case? I get an EOFException while running the same steps in my development environment (i.e., not as a unit test).

1. This is needed so that the rcfiles in the target table are compressed with Bzip2. Do you mean that we should be using Default compression codec instead? Fine with me but why is that important?

2. tgt does contain more than one file.

[before alter]
+POSTHOOK: query: show table extended like `tgt_rc_merge_test`
...
+totalNumberFiles:2
...
[after alter]
+POSTHOOK: query: show table extended like `tgt_rc_merge_test`
...
+totalNumberFiles:1

The 'create' adds one file, and the insert adds another file. [OT: Does it make sense append a block merge task after an non-overwrite insert? Dunno...]

> Merging of compressed rcfiles fails to write the valuebuffer part correctly
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-2417
>                 URL: https://issues.apache.org/jira/browse/HIVE-2417
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Krishna Kumar
>            Assignee: Krishna Kumar
>         Attachments: HIVE-2417.v0.patch
>
>
> The blockmerge task does not create proper rc files when merging compressed rc files as the valuebuffer writing is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly

Posted by "Krishna Kumar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Krishna Kumar updated HIVE-2417:
--------------------------------

    Status: Patch Available  (was: Open)

> Merging of compressed rcfiles fails to write the valuebuffer part correctly
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-2417
>                 URL: https://issues.apache.org/jira/browse/HIVE-2417
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Krishna Kumar
>            Assignee: Krishna Kumar
>         Attachments: HIVE-2417.v0.patch
>
>
> The blockmerge task does not create proper rc files when merging compressed rc files as the valuebuffer writing is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094298#comment-13094298 ] 

He Yongqiang commented on HIVE-2417:
------------------------------------

by "2 inserts", i mean remove the "load" command, and use 2 inserts to pop the data. 

> Merging of compressed rcfiles fails to write the valuebuffer part correctly
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-2417
>                 URL: https://issues.apache.org/jira/browse/HIVE-2417
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Krishna Kumar
>            Assignee: Krishna Kumar
>         Attachments: HIVE-2417.v0.patch
>
>
> The blockmerge task does not create proper rc files when merging compressed rc files as the valuebuffer writing is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094297#comment-13094297 ] 

He Yongqiang commented on HIVE-2417:
------------------------------------

bq.The 'create' adds one file, and the insert adds another file.
sorry, i thought you are doing an "insert overwrite ", can u do 2 inserts? 

bq.This is needed so that the rcfiles in the target table are compressed with Bzip2. Do you mean that we should be using Default compression codec instead? Fine with me but why is that important?

Yes. i mean if you remove this line and keep the line "set hive.exec.compress.output = true;". The output will be compressed using DefaultCodec. The reason is that BZip2 may not installed for all hive users/dev.

> Merging of compressed rcfiles fails to write the valuebuffer part correctly
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-2417
>                 URL: https://issues.apache.org/jira/browse/HIVE-2417
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Krishna Kumar
>            Assignee: Krishna Kumar
>         Attachments: HIVE-2417.v0.patch
>
>
> The blockmerge task does not create proper rc files when merging compressed rc files as the valuebuffer writing is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-2417:
-------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

> Merging of compressed rcfiles fails to write the valuebuffer part correctly
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-2417
>                 URL: https://issues.apache.org/jira/browse/HIVE-2417
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Krishna Kumar
>            Assignee: Krishna Kumar
>         Attachments: HIVE-2417.v0.patch, HIVE-2417.v1.patch
>
>
> The blockmerge task does not create proper rc files when merging compressed rc files as the valuebuffer writing is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly

Posted by "Krishna Kumar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Krishna Kumar updated HIVE-2417:
--------------------------------

    Attachment: HIVE-2417.v1.patch

Test changed after review comments
 - default codec instead of bzip2
 - Create + 2 inserts instead of CTAS + 1 insert

> Merging of compressed rcfiles fails to write the valuebuffer part correctly
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-2417
>                 URL: https://issues.apache.org/jira/browse/HIVE-2417
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Krishna Kumar
>            Assignee: Krishna Kumar
>         Attachments: HIVE-2417.v0.patch, HIVE-2417.v1.patch
>
>
> The blockmerge task does not create proper rc files when merging compressed rc files as the valuebuffer writing is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira