You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ning Zhang (JIRA)" <ji...@apache.org> on 2011/05/07 17:14:07 UTC

[jira] [Commented] (HIVE-1968) data corruption with multi-table insert

    [ https://issues.apache.org/jira/browse/HIVE-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030360#comment-13030360 ] 

Ning Zhang commented on HIVE-1968:
----------------------------------

@Joydeep, Yongqiang and I were trying to reproduce the bug but couldn't. We tried different query patterns (1 map-only job + 1 mapreduce job, and dynamic partition inserts) and on small & large data sets. All these worked as expected. So without a concrete example it's very hard to say it is a bug in multi-table inserts. Do you have any chance to dig into your query log and find out the specific query?

> data corruption with multi-table insert
> ---------------------------------------
>
>                 Key: HIVE-1968
>                 URL: https://issues.apache.org/jira/browse/HIVE-1968
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.7.0
>            Reporter: Joydeep Sen Sarma
>
> i had to run a conversion process to compute a checksum (sum(hash(all-columns)) of a table and convert it to a different compression format. trying to be clever - i did both of them in a single pass by doing something to the equivalent of:
> from (select col1, col2, hash(col1, col2) as val from table_to_be_converted) i
> insert overwrite table table_to_be_generated select i.col1, i.col2
> insert overwrite table table_to_be_converted_checksum select sum(hash(i.val));
> the plan looked correct. however - the data produced was erroneous - the checksums and the data were both wrong (and consistent with each other). i know this because:
> - the checksum computed by the above query didn't match the checksum on the input table when calculated separately
> - the checksum of the data output by this query (first insert clause) didn't match the input table's checksum (neither the one computed by the query above, nor by the one computed separately)
> later on - i broke up this query into two independent ones - and the data and checksums were good (ie. they all matched up). so seems like there's some data corruption happening in MTI.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira