You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by wanghaifei <wa...@jd.com> on 2015/03/04 06:20:21 UTC

How to merge the data for hive-0.14

dear,sir
  Thanks for watching this mail.
   I'm reading a doubt hive0.14. To modify and delete Operation, we  added identifier.When I run the select ,hive0.14 reads every delta bucket.This means that it contains the original data and modifying data (or delete data),so I don't understand where the merged data .
   I hope to answer.
  Thank you once again



王海飞 京东金融技术与数据服务部

Re: How to merge the data for hive-0.14

Posted by Alan Gates <al...@gmail.com>.
When data is written to a table labeled transactional each row contains
a transaction id and row id. At read time the reader merges the records
on these transaction ids and rowids and chooses which version to return
to the query based on the reader's transaction state.

For example, you might have a row in the base file with rowid 1
transaction id 2 and a row in the delta file with rowid 1 and
transaction id 3. The reader would then merge these two rows and use its
transaction state to determine if it shouldn't see the row at all (would
be true if the reader's latest transaction was 1), should see the
version from transaction 2, or should see the version from transaction 3.

Alan.

> wanghaifei <ma...@jd.com>
> March 3, 2015 at 21:20
> dear,sir
> Thanks for watching this mail.
> I'm reading a doubt hive0.14. To modify and delete Operation, we added
> identifier.When I run the select ,hive0.14 reads every delta
> bucket.This means that it contains the original data and modifying
> data (or delete data),so I don't understand where the merged data .
> I hope to answer.
> Thank you once again
>
>
>
> 王海飞 京东金融技术与数据服务部