You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Namit Jain (Created) (JIRA)" <ji...@apache.org> on 2011/12/15 23:01:31 UTC

[jira] [Created] (HIVE-2658) add a option in hive to skip corrupted data entirely

add a option in hive to skip corrupted data entirely
----------------------------------------------------

                 Key: HIVE-2658
                 URL: https://issues.apache.org/jira/browse/HIVE-2658
             Project: Hive
          Issue Type: New Feature
            Reporter: Namit Jain
            Assignee: He Yongqiang


Add a new parameter:

hive.skip.corrupted.data

This is independent of the type of the underlying data.

The idea is as follows:

We have some corrupted data in our cluster right now.
We will run hive over all the corrupted partitions:

use bucketizedhiveinputformat
set hive.skip.corrupted.data=true

insert overwrite table <T> partition <P> 
select * from <T> where <P>

This way, <T>@<P> will be regenerated with all the data that can be read.

If HiveRecordReader gets a exception getting the next row, the mapper will behave as if no more data is present in the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-2658) add a option in hive to skip corrupted data entirely

Posted by "Namit Jain (Resolved) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain resolved HIVE-2658.
------------------------------

    Resolution: Won't Fix

Not needed - it will be difficult to come up with a generic 
exception.
                
> add a option in hive to skip corrupted data entirely
> ----------------------------------------------------
>
>                 Key: HIVE-2658
>                 URL: https://issues.apache.org/jira/browse/HIVE-2658
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Namit Jain
>            Assignee: He Yongqiang
>
> Add a new parameter:
> hive.skip.corrupted.data
> This is independent of the type of the underlying data.
> The idea is as follows:
> We have some corrupted data in our cluster right now.
> We will run hive over all the corrupted partitions:
> use bucketizedhiveinputformat
> set hive.skip.corrupted.data=true
> insert overwrite table <T> partition <P> 
> select * from <T> where <P>
> This way, <T>@<P> will be regenerated with all the data that can be read.
> If HiveRecordReader gets a exception getting the next row, the mapper will behave as if no more data is present in the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira