You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ramkumar Vadali (JIRA)" <ji...@apache.org> on 2011/08/24 21:06:33 UTC

[jira] [Created] (HIVE-2404) Allow RCFile Reader to tolerate corruptions

Allow RCFile Reader to tolerate corruptions
-------------------------------------------

                 Key: HIVE-2404
                 URL: https://issues.apache.org/jira/browse/HIVE-2404
             Project: Hive
          Issue Type: Improvement
          Components: Query Processor
    Affects Versions: 0.7.1
            Reporter: Ramkumar Vadali
            Assignee: Ramkumar Vadali
            Priority: Minor


Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.

We need a way to gracefully ignore errors while reading a RC File

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2404) Allow RCFile Reader to tolerate corruptions

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092390#comment-13092390 ] 

jiraposter@reviews.apache.org commented on HIVE-2404:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1660/
-----------------------------------------------------------

(Updated 2011-08-27 23:13:24.160233)


Review request for Yongqiang He and Paul Yang.


Changes
-------

Added unit-test, also handled ChecksumException


Summary
-------

Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.

We need a way to gracefully ignore errors while reading a RC File


This addresses bug HIVE-2404.
    https://issues.apache.org/jira/browse/HIVE-2404


Diffs (updated)
-----

  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 1161660 
  trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 1161660 

Diff: https://reviews.apache.org/r/1660/diff


Testing
-------

Manual testing with a corrupt RC file


Thanks,

Ramkumar



> Allow RCFile Reader to tolerate corruptions
> -------------------------------------------
>
>                 Key: HIVE-2404
>                 URL: https://issues.apache.org/jira/browse/HIVE-2404
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>            Priority: Minor
>         Attachments: toleratecorruptions.patch
>
>
> Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.
> We need a way to gracefully ignore errors while reading a RC File

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2404) Allow RCFile Reader to tolerate corruptions

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096476#comment-13096476 ] 

He Yongqiang commented on HIVE-2404:
------------------------------------

+1, will commit after tests pass

> Allow RCFile Reader to tolerate corruptions
> -------------------------------------------
>
>                 Key: HIVE-2404
>                 URL: https://issues.apache.org/jira/browse/HIVE-2404
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>            Priority: Minor
>         Attachments: toleratecorruptions.2.patch, toleratecorruptions.3.patch, toleratecorruptions.patch
>
>
> Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.
> We need a way to gracefully ignore errors while reading a RC File

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2404) Allow RCFile Reader to tolerate corruptions

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092397#comment-13092397 ] 

jiraposter@reviews.apache.org commented on HIVE-2404:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1671/
-----------------------------------------------------------

Review request for Yongqiang He and Paul Yang.


Summary
-------

Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.

We need a way to gracefully ignore errors while reading a RC File


This addresses bug HIVE-2404.
    https://issues.apache.org/jira/browse/HIVE-2404


Diffs
-----

  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 1161660 
  trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 1161660 

Diff: https://reviews.apache.org/r/1671/diff


Testing
-------

Manual test with corrupt RC file, added unit-test


Thanks,

Ramkumar



> Allow RCFile Reader to tolerate corruptions
> -------------------------------------------
>
>                 Key: HIVE-2404
>                 URL: https://issues.apache.org/jira/browse/HIVE-2404
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>            Priority: Minor
>         Attachments: toleratecorruptions.2.patch, toleratecorruptions.patch
>
>
> Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.
> We need a way to gracefully ignore errors while reading a RC File

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2404) Allow RCFile Reader to tolerate corruptions

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096323#comment-13096323 ] 

jiraposter@reviews.apache.org commented on HIVE-2404:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1671/
-----------------------------------------------------------

(Updated 2011-09-02 21:31:47.700607)


Review request for Yongqiang He and Paul Yang.


Changes
-------

Addressed code-review feedback


Summary
-------

Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.

We need a way to gracefully ignore errors while reading a RC File


This addresses bug HIVE-2404.
    https://issues.apache.org/jira/browse/HIVE-2404


Diffs (updated)
-----

  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 1161660 
  trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 1161660 

Diff: https://reviews.apache.org/r/1671/diff


Testing
-------

Manual test with corrupt RC file, added unit-test


Thanks,

Ramkumar



> Allow RCFile Reader to tolerate corruptions
> -------------------------------------------
>
>                 Key: HIVE-2404
>                 URL: https://issues.apache.org/jira/browse/HIVE-2404
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>            Priority: Minor
>         Attachments: toleratecorruptions.2.patch, toleratecorruptions.patch
>
>
> Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.
> We need a way to gracefully ignore errors while reading a RC File

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2404) Allow RCFile Reader to tolerate corruptions

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098490#comment-13098490 ] 

Hudson commented on HIVE-2404:
------------------------------

Integrated in Hive-trunk-h0.21 #937 (See [https://builds.apache.org/job/Hive-trunk-h0.21/937/])
    HIVE-2404: Allow RCFile Reader to tolerate corruptions (Ramkumar Vadali via He Yongqiang)

heyongqiang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1165763
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java


> Allow RCFile Reader to tolerate corruptions
> -------------------------------------------
>
>                 Key: HIVE-2404
>                 URL: https://issues.apache.org/jira/browse/HIVE-2404
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>            Priority: Minor
>         Attachments: toleratecorruptions.2.patch, toleratecorruptions.3.patch, toleratecorruptions.patch
>
>
> Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.
> We need a way to gracefully ignore errors while reading a RC File

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2404) Allow RCFile Reader to tolerate corruptions

Posted by "Ramkumar Vadali (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ramkumar Vadali updated HIVE-2404:
----------------------------------

    Attachment: toleratecorruptions.patch

This patch add a configuration option hive.io.rcfile.tolerate.corruptions. If the option is set to true - 
 * lazy decompression is disabled
 * Unexpected errors are treated as corruptions and the reader indicates no more data

This allows graceful termination of the read when there are corruptions

The default value of hive.io.rcfile.tolerate.corruptions is false

Tested this by using rcfilecat on a file with a corrupt block of data.

> Allow RCFile Reader to tolerate corruptions
> -------------------------------------------
>
>                 Key: HIVE-2404
>                 URL: https://issues.apache.org/jira/browse/HIVE-2404
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>            Priority: Minor
>         Attachments: toleratecorruptions.patch
>
>
> Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.
> We need a way to gracefully ignore errors while reading a RC File

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2404) Allow RCFile Reader to tolerate corruptions

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092393#comment-13092393 ] 

jiraposter@reviews.apache.org commented on HIVE-2404:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1660/
-----------------------------------------------------------

(Updated 2011-08-27 23:17:16.153142)


Review request for Yongqiang He and Paul Yang.


Changes
-------

Revert unintentional changes to TestRCFile


Summary
-------

Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.

We need a way to gracefully ignore errors while reading a RC File


This addresses bug HIVE-2404.
    https://issues.apache.org/jira/browse/HIVE-2404


Diffs (updated)
-----

  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 1161660 
  trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 1161660 

Diff: https://reviews.apache.org/r/1660/diff


Testing
-------

Manual testing with a corrupt RC file


Thanks,

Ramkumar



> Allow RCFile Reader to tolerate corruptions
> -------------------------------------------
>
>                 Key: HIVE-2404
>                 URL: https://issues.apache.org/jira/browse/HIVE-2404
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>            Priority: Minor
>         Attachments: toleratecorruptions.patch
>
>
> Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.
> We need a way to gracefully ignore errors while reading a RC File

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2404) Allow RCFile Reader to tolerate corruptions

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090416#comment-13090416 ] 

He Yongqiang commented on HIVE-2404:
------------------------------------

can u create a review board for this? https://reviews.apache.org/r/new/
thanks.

> Allow RCFile Reader to tolerate corruptions
> -------------------------------------------
>
>                 Key: HIVE-2404
>                 URL: https://issues.apache.org/jira/browse/HIVE-2404
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>            Priority: Minor
>         Attachments: toleratecorruptions.patch
>
>
> Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.
> We need a way to gracefully ignore errors while reading a RC File

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2404) Allow RCFile Reader to tolerate corruptions

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096295#comment-13096295 ] 

He Yongqiang commented on HIVE-2404:
------------------------------------

Awesome feature! some nitpick comments on review board. Thanks!

> Allow RCFile Reader to tolerate corruptions
> -------------------------------------------
>
>                 Key: HIVE-2404
>                 URL: https://issues.apache.org/jira/browse/HIVE-2404
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>            Priority: Minor
>         Attachments: toleratecorruptions.2.patch, toleratecorruptions.patch
>
>
> Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.
> We need a way to gracefully ignore errors while reading a RC File

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2404) Allow RCFile Reader to tolerate corruptions

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091163#comment-13091163 ] 

jiraposter@reviews.apache.org commented on HIVE-2404:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1660/
-----------------------------------------------------------

Review request for Yongqiang He and Paul Yang.


Summary
-------

Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.

We need a way to gracefully ignore errors while reading a RC File


This addresses bug HIVE-2404.
    https://issues.apache.org/jira/browse/HIVE-2404


Diffs
-----

  trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 1161660 

Diff: https://reviews.apache.org/r/1660/diff


Testing
-------

Manual testing with a corrupt RC file


Thanks,

Ramkumar



> Allow RCFile Reader to tolerate corruptions
> -------------------------------------------
>
>                 Key: HIVE-2404
>                 URL: https://issues.apache.org/jira/browse/HIVE-2404
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>            Priority: Minor
>         Attachments: toleratecorruptions.patch
>
>
> Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.
> We need a way to gracefully ignore errors while reading a RC File

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2404) Allow RCFile Reader to tolerate corruptions

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096310#comment-13096310 ] 

jiraposter@reviews.apache.org commented on HIVE-2404:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1671/#review1740
-----------------------------------------------------------



trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
<https://reviews.apache.org/r/1671/#comment3962>

    ok, will do that



trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
<https://reviews.apache.org/r/1671/#comment3961>

    The difference is that ret.resetValid(columnNumber); should be called when tolerateCorruptions is true


- Ramkumar


On 2011-08-27 23:22:07, Ramkumar Vadali wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1671/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-08-27 23:22:07)
bq.  
bq.  
bq.  Review request for Yongqiang He and Paul Yang.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.
bq.  
bq.  We need a way to gracefully ignore errors while reading a RC File
bq.  
bq.  
bq.  This addresses bug HIVE-2404.
bq.      https://issues.apache.org/jira/browse/HIVE-2404
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 1161660 
bq.    trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 1161660 
bq.  
bq.  Diff: https://reviews.apache.org/r/1671/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Manual test with corrupt RC file, added unit-test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ramkumar
bq.  
bq.



> Allow RCFile Reader to tolerate corruptions
> -------------------------------------------
>
>                 Key: HIVE-2404
>                 URL: https://issues.apache.org/jira/browse/HIVE-2404
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>            Priority: Minor
>         Attachments: toleratecorruptions.2.patch, toleratecorruptions.patch
>
>
> Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.
> We need a way to gracefully ignore errors while reading a RC File

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2404) Allow RCFile Reader to tolerate corruptions

Posted by "Ramkumar Vadali (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ramkumar Vadali updated HIVE-2404:
----------------------------------

    Attachment: toleratecorruptions.3.patch

latest patch after code review

> Allow RCFile Reader to tolerate corruptions
> -------------------------------------------
>
>                 Key: HIVE-2404
>                 URL: https://issues.apache.org/jira/browse/HIVE-2404
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>            Priority: Minor
>         Attachments: toleratecorruptions.2.patch, toleratecorruptions.3.patch, toleratecorruptions.patch
>
>
> Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.
> We need a way to gracefully ignore errors while reading a RC File

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2404) Allow RCFile Reader to tolerate corruptions

Posted by "Ramkumar Vadali (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ramkumar Vadali updated HIVE-2404:
----------------------------------

    Attachment: toleratecorruptions.2.patch

Patch with unit-test

> Allow RCFile Reader to tolerate corruptions
> -------------------------------------------
>
>                 Key: HIVE-2404
>                 URL: https://issues.apache.org/jira/browse/HIVE-2404
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>            Priority: Minor
>         Attachments: toleratecorruptions.2.patch, toleratecorruptions.patch
>
>
> Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.
> We need a way to gracefully ignore errors while reading a RC File

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HIVE-2404) Allow RCFile Reader to tolerate corruptions

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang resolved HIVE-2404.
--------------------------------

    Resolution: Fixed

committed, thanks Ramkumar!

> Allow RCFile Reader to tolerate corruptions
> -------------------------------------------
>
>                 Key: HIVE-2404
>                 URL: https://issues.apache.org/jira/browse/HIVE-2404
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>            Priority: Minor
>         Attachments: toleratecorruptions.2.patch, toleratecorruptions.3.patch, toleratecorruptions.patch
>
>
> Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.
> We need a way to gracefully ignore errors while reading a RC File

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2404) Allow RCFile Reader to tolerate corruptions

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096296#comment-13096296 ] 

jiraposter@reviews.apache.org commented on HIVE-2404:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1671/#review1737
-----------------------------------------------------------



trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
<https://reviews.apache.org/r/1671/#comment3960>

    use conf.getBoolean(, false); there is no difference at all, but just in case a small chance of any change happened to var 'tolerateCorruptions.'



trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
<https://reviews.apache.org/r/1671/#comment3959>

    Let's move the code here to a new method. what do you think?



trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java
<https://reviews.apache.org/r/1671/#comment3958>

    I think the code here can be removed? since currentValueBuffer is always called in next() when enable tolerateCorruption.


- Yongqiang


On 2011-08-27 23:22:07, Ramkumar Vadali wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1671/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-08-27 23:22:07)
bq.  
bq.  
bq.  Review request for Yongqiang He and Paul Yang.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.
bq.  
bq.  We need a way to gracefully ignore errors while reading a RC File
bq.  
bq.  
bq.  This addresses bug HIVE-2404.
bq.      https://issues.apache.org/jira/browse/HIVE-2404
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java 1161660 
bq.    trunk/ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 1161660 
bq.  
bq.  Diff: https://reviews.apache.org/r/1671/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Manual test with corrupt RC file, added unit-test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ramkumar
bq.  
bq.



> Allow RCFile Reader to tolerate corruptions
> -------------------------------------------
>
>                 Key: HIVE-2404
>                 URL: https://issues.apache.org/jira/browse/HIVE-2404
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.7.1
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>            Priority: Minor
>         Attachments: toleratecorruptions.2.patch, toleratecorruptions.patch
>
>
> Sometimes it is useful to tolerate corruptions during a query and return results based on the files that can be processed. A single corrupt block of data should not prevent reading the rest of the data.
> We need a way to gracefully ignore errors while reading a RC File

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira