You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2012/06/24 03:43:42 UTC

[jira] [Created] (HIVE-3182) Add an option in hive to ignore corrupt data

Namit Jain created HIVE-3182:
--------------------------------

             Summary: Add an option in hive to ignore corrupt data
                 Key: HIVE-3182
                 URL: https://issues.apache.org/jira/browse/HIVE-3182
             Project: Hive
          Issue Type: Bug
            Reporter: Namit Jain
            Assignee: Namit Jain


In many scenarios, we have seen java.lang.InternalError due to corruption.
This may be due to LZMA or some other kind of corrupt data. It would be
useful to add an option in hive to ignore corrupt data, or ignore any internal
errors. Typically, this should only be used to handle corrupt data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3182) Add an option in hive to ignore corrupt data

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-3182:
---------------------------------

    Component/s: Serializers/Deserializers

Any options that are added should be narrowly defined and specific to a particular class of errors. We want to avoid the situation where a user enables this property to mask one type of problem only to end up missing other problems which are important.
                
> Add an option in hive to ignore corrupt data
> --------------------------------------------
>
>                 Key: HIVE-3182
>                 URL: https://issues.apache.org/jira/browse/HIVE-3182
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>              Labels: configuration-addition
>
> In many scenarios, we have seen java.lang.InternalError due to corruption.
> This may be due to LZMA or some other kind of corrupt data. It would be
> useful to add an option in hive to ignore corrupt data, or ignore any internal
> errors. Typically, this should only be used to handle corrupt data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3182) Add an option in hive to ignore corrupt data

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400934#comment-13400934 ] 

Edward Capriolo commented on HIVE-3182:
---------------------------------------

Doesn't hadoop offer skip failed row options?
                
> Add an option in hive to ignore corrupt data
> --------------------------------------------
>
>                 Key: HIVE-3182
>                 URL: https://issues.apache.org/jira/browse/HIVE-3182
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>              Labels: configuration-addition
>
> In many scenarios, we have seen java.lang.InternalError due to corruption.
> This may be due to LZMA or some other kind of corrupt data. It would be
> useful to add an option in hive to ignore corrupt data, or ignore any internal
> errors. Typically, this should only be used to handle corrupt data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira