You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by David Morin <mo...@gmail.com> on 2019/12/01 16:57:08 UTC

Re: ORC: duplicate record - rowid meaning ?

Hi Peter,

At the moment I have a pipeline based on Flink to write Orc Files. These Orc Files can be read from Hive thanks to external tables and, then, a merge statement (triggered by oozie) push these data into tables managed by Hive (transactional tables => ORC). Hive version is 2.1 because this is the one provided by HDP 2.6.5.
We've developed a system that write Hive Delta Files for the managed tables directly from Flink.
The current streaming apis for Hive 2 are not suitable for our needs and we cannot use the new Hive 3 streaming api yet. This system uses the Flink state to store Hive metadata (originalTransaction, bucket, rowId, ..)
Thanks for your reply because yes, when files are ordered by originalTransacion, bucket, rowId
it works ! I just have to use 1 transaction instead of 2 at the moment and it will be ok.

Thanks
David

On 2019/11/29 11:18:05, Peter Vary <pv...@cloudera.com> wrote: 
> Hi David,
> 
> Not entirely sure what you are doing here :), my guess is that you are trying to write ACID tables outside of hive. Am I right? What is the exact use-case? There might be better solutions out there than writing the files by hand.
> 
> As for your question below: Yes, the files should be ordered by: originalTransacion, bucket, rowId triple, otherwise you will get wrong results.
> 
> Thanks,
> Peter
> 
> > On Nov 19, 2019, at 13:30, David Morin <mo...@gmail.com> wrote:
> > 
> > here after more details about ORC content and the fact we have duplicate rows:
> > 
> > /delta_0011365_0011365_0000/bucket_00003
> > 
> > {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11365,"row":{"TS":1574156027915254212,"cle":5218,...}}
> > {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11365,"row":{"TS":1574156027915075038,"cle":5216,...}}
> > 
> > 
> > /delta_0011368_0011368_0000/bucket_00003
> > 
> > {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11368,"row":null}
> > {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11368,"row":null}
> > 
> > /delta_0011369_0011369_0000/bucket_00003
> > 
> > {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":1,"currentTransaction":11369,"row":{"TS":1574157407855174144,"cle":5216,...}}
> > {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":0,"currentTransaction":11369,"row":{"TS":1574157407855265906,"cle":5218,...}}
> > 
> > +-------------------------------------------------+-------+--+
> > |                     row__id                     |  cle  |
> > +-------------------------------------------------+-------+--+
> > | {"transactionid":11367,"bucketid":0,"rowid":0}  | 5209  |
> > | {"transactionid":11369,"bucketid":0,"rowid":0}  | 5211  |
> > | {"transactionid":11369,"bucketid":1,"rowid":0}  | 5210  |
> > | {"transactionid":11369,"bucketid":2,"rowid":0}  | 5214  |
> > | {"transactionid":11369,"bucketid":2,"rowid":1}  | 5215  |
> > | {"transactionid":11365,"bucketid":3,"rowid":0}  | 5218  |
> > | {"transactionid":11365,"bucketid":3,"rowid":1}  | 5216  |
> > | {"transactionid":11369,"bucketid":3,"rowid":1}  | 5216  |
> > | {"transactionid":11369,"bucketid":3,"rowid":0}  | 5218  |
> > | {"transactionid":11369,"bucketid":4,"rowid":0}  | 5217  |
> > | {"transactionid":11369,"bucketid":4,"rowid":1}  | 5213  |
> > | {"transactionid":11369,"bucketid":7,"rowid":0}  | 5212  |
> > +-------------------------------------------------+-------+--+
> > 
> > As you can see we have duplicate rows for column "cle" 5216 and 5218
> > Do we have to keep the rowids ordered ? because this is the only difference I have noticed based on some tests with beeline.
> > 
> > Thanks
> > 
> > 
> > 
> > Le mar. 19 nov. 2019 à 00:18, David Morin <morin.david.bzh@gmail.com <ma...@gmail.com>> a écrit :
> > Hello,
> > 
> > I'm trying to understand the purpose of the rowid column inside ORC delta file
> > {"transactionid":11359,"bucketid":5,"rowid":0}
> > Orc view: {"operation":0,"originalTransaction":11359,"bucket":5,"rowId":0,"currentTransaction":11359,"row":...}
> > I use HDP 2.6 => Hive 2
> > 
> > If I want to be idempotent with INSERT / DELETE / INSERT. 
> > Do we have to keep the same rowid ?
> > It seems that when the rowid is changed during the second INSERT I have a duplicate row.
> > For me, I can create a new rowid for the new transaction during the second INSERT but that seems to generate duplicate records.
> > 
> > Regards,
> > David
> > 
> > 
> > 
> 
> 

Re: ORC: duplicate record - rowid meaning ?

Posted by Peter Vary <pv...@cloudera.com>.
Thanks David,
Hope that Hive 3 streaming will help you soon to avoid these kind of headaches :)
Peter

> On Dec 1, 2019, at 17:57, David Morin <mo...@gmail.com> wrote:
> 
> Hi Peter,
> 
> At the moment I have a pipeline based on Flink to write Orc Files. These Orc Files can be read from Hive thanks to external tables and, then, a merge statement (triggered by oozie) push these data into tables managed by Hive (transactional tables => ORC). Hive version is 2.1 because this is the one provided by HDP 2.6.5.
> We've developed a system that write Hive Delta Files for the managed tables directly from Flink.
> The current streaming apis for Hive 2 are not suitable for our needs and we cannot use the new Hive 3 streaming api yet. This system uses the Flink state to store Hive metadata (originalTransaction, bucket, rowId, ..)
> Thanks for your reply because yes, when files are ordered by originalTransacion, bucket, rowId
> it works ! I just have to use 1 transaction instead of 2 at the moment and it will be ok.
> 
> Thanks
> David
> 
> On 2019/11/29 11:18:05, Peter Vary <pv...@cloudera.com> wrote: 
>> Hi David,
>> 
>> Not entirely sure what you are doing here :), my guess is that you are trying to write ACID tables outside of hive. Am I right? What is the exact use-case? There might be better solutions out there than writing the files by hand.
>> 
>> As for your question below: Yes, the files should be ordered by: originalTransacion, bucket, rowId triple, otherwise you will get wrong results.
>> 
>> Thanks,
>> Peter
>> 
>>> On Nov 19, 2019, at 13:30, David Morin <mo...@gmail.com> wrote:
>>> 
>>> here after more details about ORC content and the fact we have duplicate rows:
>>> 
>>> /delta_0011365_0011365_0000/bucket_00003
>>> 
>>> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11365,"row":{"TS":1574156027915254212,"cle":5218,...}}
>>> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11365,"row":{"TS":1574156027915075038,"cle":5216,...}}
>>> 
>>> 
>>> /delta_0011368_0011368_0000/bucket_00003
>>> 
>>> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11368,"row":null}
>>> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11368,"row":null}
>>> 
>>> /delta_0011369_0011369_0000/bucket_00003
>>> 
>>> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":1,"currentTransaction":11369,"row":{"TS":1574157407855174144,"cle":5216,...}}
>>> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":0,"currentTransaction":11369,"row":{"TS":1574157407855265906,"cle":5218,...}}
>>> 
>>> +-------------------------------------------------+-------+--+
>>> |                     row__id                     |  cle  |
>>> +-------------------------------------------------+-------+--+
>>> | {"transactionid":11367,"bucketid":0,"rowid":0}  | 5209  |
>>> | {"transactionid":11369,"bucketid":0,"rowid":0}  | 5211  |
>>> | {"transactionid":11369,"bucketid":1,"rowid":0}  | 5210  |
>>> | {"transactionid":11369,"bucketid":2,"rowid":0}  | 5214  |
>>> | {"transactionid":11369,"bucketid":2,"rowid":1}  | 5215  |
>>> | {"transactionid":11365,"bucketid":3,"rowid":0}  | 5218  |
>>> | {"transactionid":11365,"bucketid":3,"rowid":1}  | 5216  |
>>> | {"transactionid":11369,"bucketid":3,"rowid":1}  | 5216  |
>>> | {"transactionid":11369,"bucketid":3,"rowid":0}  | 5218  |
>>> | {"transactionid":11369,"bucketid":4,"rowid":0}  | 5217  |
>>> | {"transactionid":11369,"bucketid":4,"rowid":1}  | 5213  |
>>> | {"transactionid":11369,"bucketid":7,"rowid":0}  | 5212  |
>>> +-------------------------------------------------+-------+--+
>>> 
>>> As you can see we have duplicate rows for column "cle" 5216 and 5218
>>> Do we have to keep the rowids ordered ? because this is the only difference I have noticed based on some tests with beeline.
>>> 
>>> Thanks
>>> 
>>> 
>>> 
>>> Le mar. 19 nov. 2019 à 00:18, David Morin <morin.david.bzh@gmail.com <ma...@gmail.com>> a écrit :
>>> Hello,
>>> 
>>> I'm trying to understand the purpose of the rowid column inside ORC delta file
>>> {"transactionid":11359,"bucketid":5,"rowid":0}
>>> Orc view: {"operation":0,"originalTransaction":11359,"bucket":5,"rowId":0,"currentTransaction":11359,"row":...}
>>> I use HDP 2.6 => Hive 2
>>> 
>>> If I want to be idempotent with INSERT / DELETE / INSERT. 
>>> Do we have to keep the same rowid ?
>>> It seems that when the rowid is changed during the second INSERT I have a duplicate row.
>>> For me, I can create a new rowid for the new transaction during the second INSERT but that seems to generate duplicate records.
>>> 
>>> Regards,
>>> David
>>> 
>>> 
>>> 
>> 
>> 


Re: ORC: duplicate record - rowid meaning ?

Posted by Peter Vary <pv...@cloudera.com>.
Hey David,

Good to know it was fixed! :)

Thanks,
Peter

> On Feb 25, 2020, at 12:47, David Morin <mo...@gmail.com> wrote:
> 
> Hi Peter,
> 
> Just to give some news concerning my issue.
> The problem is fixed. In fact, it was a reset of rowid in my application because default batch size of my VectorizedRowBatch (ORC) is 1024
> And during the reset of this batch, a reset of rowid was done.
> By now it works as expected
> 
> Thanks
> David
> 
> Le jeu. 6 févr. 2020 à 12:12, David Morin <morin.david.bzh@gmail.com <ma...@gmail.com>> a écrit :
> ok, Peter
> No problem. Thx
> I'll keep you in touch
> 
> On 2020/02/06 09:42:39, Peter Vary <pvary@cloudera.com <ma...@cloudera.com>> wrote: 
> > Hi David,
> > 
> > I more familiar with ACID v2 :(
> > What I would do is to run an update operation with your version of Hive and try to see how it handles this case.
> > 
> > Would be nice to hear back from you if you found something.
> > 
> > Thanks,
> > Peter
> > 
> > > On Feb 5, 2020, at 16:55, David Morin <morin.david.bzh@gmail.com <ma...@gmail.com>> wrote:
> > > 
> > > Hello,
> > > 
> > > Thanks.
> > > In fact I use HDP 2.6.5 and previous Orc version with transactionid for example and the update flag.
> > > Sorry with the row__id iw would have been easier
> > > 
> > > So, Here after the Orc files content (with hive --orcfiledump)
> > > hive --orcfiledump hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000 
> > > {"operation":0,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":198994,"row":{...}}
> > > 
> > > hive --orcfiledump hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 
> > > {"operation":1,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":199073,"row":{...}}
> > > 
> > > => When I execute a SELECT statement on the PK I see 2 records. The link is not done.
> > > Still, the meta looks good.
> > > The only difference I see is that in Orc files I mix operations.
> > > When I perform a MERGE statement for example 2 directories have been created 
> > > hdfs://XXXX/delta_0199073_0199073_0000
> > > hdfs://XXXX/delta_0199073_0199073_0002
> > > And the first one contains updates (operation:1) and the second one, inserts (operation:0)
> > > 
> > > And here after the row__id values
> > > 
> > > (related file: hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000)
> > > row__id    {"transactionid":198994,"bucketid":0,"rowid":14}
> > > ts         1580751316822642886 (5h35)
> > > id         764925
> > > 
> > > (related file: hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 )
> > > row__id    {"transactionid":198994,"bucketid":0,"rowid":14}
> > > ts         1580757191368914630 (7h13)
> > > id         764925
> > > 
> > > => And I have one duplicated value if I execute a Query based on the id (primary key)
> > > 
> > > So I don't understand because the transactionid=originalTransaction and all metadatas seems to be good.
> > > Probably a problem in the sort but I follow the rule that data are ordered by originalTransaction,bucketId,rowId ascendingly and currentTransaction descendingly. It works pretty well except for some tables with lot of updates.
> > > The only thing I can see at the moment it is the fact that I mix different types of operations in one bucket. The Merge query for example create different directories (one per operation)
> > > 
> > > David
> > > 
> > > 
> > > On 2020/02/05 12:22:28, Peter Vary <pvary@cloudera.com <ma...@cloudera.com>> wrote: 
> > >> Hi David,
> > >> 
> > >> There is no tombstone for the updated record.
> > >> In ACID v2 there is no update for the rows. Only insert and delete. So update is handled as delete (old) row, insert (new/independent) row.
> > >> The delete is stored in the delete delta directories., and the file do not have to contain the {row} struct at the end.
> > >> 
> > >> Hope this helps,
> > >> Peter
> > >> 
> > >>> On Feb 5, 2020, at 09:39, David Morin <morin.david.bzh@gmail.com <ma...@gmail.com>> wrote:
> > >>> 
> > >>> Hi,
> > >>> 
> > >>> It works pretty well but... still problems sometimes occur
> > >>> Do we have to separate operations ?
> > >>> 
> > >>> Here after Orc files content:
> > >>> hive --orcfiledump hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000 
> > >>> {"operation":0,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":198994,"row":{...}}
> > >>> 
> > >>> hive --orcfiledump hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 
> > >>> {"operation":1,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":199073,"row":{...}}
> > >>> 
> > >>> => When I execute a SELECT statement on the PK I see 2 records. The link is not done.
> > >>> Still, the meta looks good.
> > >>> The only difference I see is that in Orc files I mix operations.
> > >>> When I perform a MERGE statement for example 2 directories have been created 
> > >>> hdfs://XXXX/delta_0199073_0199073_0000
> > >>> hdfs://XXXX/delta_0199073_0199073_0002
> > >>> And the first one contains updates (operation:1) and the second one, inserts (operation:0)
> > >>> 
> > >>> Thanks for your help
> > >>> David
> > >>> 
> > >>> On 2019/12/01 16:57:08, David Morin <morin.david.bzh@gmail.com <ma...@gmail.com>> wrote: 
> > >>>> Hi Peter,
> > >>>> 
> > >>>> At the moment I have a pipeline based on Flink to write Orc Files. These Orc Files can be read from Hive thanks to external tables and, then, a merge statement (triggered by oozie) push these data into tables managed by Hive (transactional tables => ORC). Hive version is 2.1 because this is the one provided by HDP 2.6.5.
> > >>>> We've developed a system that write Hive Delta Files for the managed tables directly from Flink.
> > >>>> The current streaming apis for Hive 2 are not suitable for our needs and we cannot use the new Hive 3 streaming api yet. This system uses the Flink state to store Hive metadata (originalTransaction, bucket, rowId, ..)
> > >>>> Thanks for your reply because yes, when files are ordered by originalTransacion, bucket, rowId
> > >>>> it works ! I just have to use 1 transaction instead of 2 at the moment and it will be ok.
> > >>>> 
> > >>>> Thanks
> > >>>> David
> > >>>> 
> > >>>> On 2019/11/29 11:18:05, Peter Vary <pvary@cloudera.com <ma...@cloudera.com>> wrote: 
> > >>>>> Hi David,
> > >>>>> 
> > >>>>> Not entirely sure what you are doing here :), my guess is that you are trying to write ACID tables outside of hive. Am I right? What is the exact use-case? There might be better solutions out there than writing the files by hand.
> > >>>>> 
> > >>>>> As for your question below: Yes, the files should be ordered by: originalTransacion, bucket, rowId triple, otherwise you will get wrong results.
> > >>>>> 
> > >>>>> Thanks,
> > >>>>> Peter
> > >>>>> 
> > >>>>>> On Nov 19, 2019, at 13:30, David Morin <morin.david.bzh@gmail.com <ma...@gmail.com>> wrote:
> > >>>>>> 
> > >>>>>> here after more details about ORC content and the fact we have duplicate rows:
> > >>>>>> 
> > >>>>>> /delta_0011365_0011365_0000/bucket_00003
> > >>>>>> 
> > >>>>>> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11365,"row":{"TS":1574156027915254212,"cle":5218,...}}
> > >>>>>> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11365,"row":{"TS":1574156027915075038,"cle":5216,...}}
> > >>>>>> 
> > >>>>>> 
> > >>>>>> /delta_0011368_0011368_0000/bucket_00003
> > >>>>>> 
> > >>>>>> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11368,"row":null}
> > >>>>>> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11368,"row":null}
> > >>>>>> 
> > >>>>>> /delta_0011369_0011369_0000/bucket_00003
> > >>>>>> 
> > >>>>>> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":1,"currentTransaction":11369,"row":{"TS":1574157407855174144,"cle":5216,...}}
> > >>>>>> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":0,"currentTransaction":11369,"row":{"TS":1574157407855265906,"cle":5218,...}}
> > >>>>>> 
> > >>>>>> +-------------------------------------------------+-------+--+
> > >>>>>> |                     row__id                     |  cle  |
> > >>>>>> +-------------------------------------------------+-------+--+
> > >>>>>> | {"transactionid":11367,"bucketid":0,"rowid":0}  | 5209  |
> > >>>>>> | {"transactionid":11369,"bucketid":0,"rowid":0}  | 5211  |
> > >>>>>> | {"transactionid":11369,"bucketid":1,"rowid":0}  | 5210  |
> > >>>>>> | {"transactionid":11369,"bucketid":2,"rowid":0}  | 5214  |
> > >>>>>> | {"transactionid":11369,"bucketid":2,"rowid":1}  | 5215  |
> > >>>>>> | {"transactionid":11365,"bucketid":3,"rowid":0}  | 5218  |
> > >>>>>> | {"transactionid":11365,"bucketid":3,"rowid":1}  | 5216  |
> > >>>>>> | {"transactionid":11369,"bucketid":3,"rowid":1}  | 5216  |
> > >>>>>> | {"transactionid":11369,"bucketid":3,"rowid":0}  | 5218  |
> > >>>>>> | {"transactionid":11369,"bucketid":4,"rowid":0}  | 5217  |
> > >>>>>> | {"transactionid":11369,"bucketid":4,"rowid":1}  | 5213  |
> > >>>>>> | {"transactionid":11369,"bucketid":7,"rowid":0}  | 5212  |
> > >>>>>> +-------------------------------------------------+-------+--+
> > >>>>>> 
> > >>>>>> As you can see we have duplicate rows for column "cle" 5216 and 5218
> > >>>>>> Do we have to keep the rowids ordered ? because this is the only difference I have noticed based on some tests with beeline.
> > >>>>>> 
> > >>>>>> Thanks
> > >>>>>> 
> > >>>>>> 
> > >>>>>> 
> > >>>>>> Le mar. 19 nov. 2019 à 00:18, David Morin <morin.david.bzh@gmail.com <ma...@gmail.com> <mailto:morin.david.bzh@gmail.com <ma...@gmail.com>>> a écrit :
> > >>>>>> Hello,
> > >>>>>> 
> > >>>>>> I'm trying to understand the purpose of the rowid column inside ORC delta file
> > >>>>>> {"transactionid":11359,"bucketid":5,"rowid":0}
> > >>>>>> Orc view: {"operation":0,"originalTransaction":11359,"bucket":5,"rowId":0,"currentTransaction":11359,"row":...}
> > >>>>>> I use HDP 2.6 => Hive 2
> > >>>>>> 
> > >>>>>> If I want to be idempotent with INSERT / DELETE / INSERT. 
> > >>>>>> Do we have to keep the same rowid ?
> > >>>>>> It seems that when the rowid is changed during the second INSERT I have a duplicate row.
> > >>>>>> For me, I can create a new rowid for the new transaction during the second INSERT but that seems to generate duplicate records.
> > >>>>>> 
> > >>>>>> Regards,
> > >>>>>> David
> > >>>>>> 
> > >>>>>> 
> > >>>>>> 
> > >>>>> 
> > >>>>> 
> > >>>> 
> > >> 
> > >> 
> > 
> > 


Re: ORC: duplicate record - rowid meaning ?

Posted by David Morin <mo...@gmail.com>.
Hi Peter,

Just to give some news concerning my issue.
The problem is fixed. In fact, it was a reset of rowid in my application
because default batch size of my VectorizedRowBatch (ORC) is 1024
And during the reset of this batch, a reset of rowid was done.
By now it works as expected

Thanks
David

Le jeu. 6 févr. 2020 à 12:12, David Morin <mo...@gmail.com> a
écrit :

> ok, Peter
> No problem. Thx
> I'll keep you in touch
>
> On 2020/02/06 09:42:39, Peter Vary <pv...@cloudera.com> wrote:
> > Hi David,
> >
> > I more familiar with ACID v2 :(
> > What I would do is to run an update operation with your version of Hive
> and try to see how it handles this case.
> >
> > Would be nice to hear back from you if you found something.
> >
> > Thanks,
> > Peter
> >
> > > On Feb 5, 2020, at 16:55, David Morin <mo...@gmail.com>
> wrote:
> > >
> > > Hello,
> > >
> > > Thanks.
> > > In fact I use HDP 2.6.5 and previous Orc version with transactionid
> for example and the update flag.
> > > Sorry with the row__id iw would have been easier
> > >
> > > So, Here after the Orc files content (with hive --orcfiledump)
> > > hive --orcfiledump hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000
> > >
> {"operation":0,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":198994,"row":{...}}
> > >
> > > hive --orcfiledump hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000
> > >
> {"operation":1,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":199073,"row":{...}}
> > >
> > > => When I execute a SELECT statement on the PK I see 2 records. The
> link is not done.
> > > Still, the meta looks good.
> > > The only difference I see is that in Orc files I mix operations.
> > > When I perform a MERGE statement for example 2 directories have been
> created
> > > hdfs://XXXX/delta_0199073_0199073_0000
> > > hdfs://XXXX/delta_0199073_0199073_0002
> > > And the first one contains updates (operation:1) and the second one,
> inserts (operation:0)
> > >
> > > And here after the row__id values
> > >
> > > (related file: hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000)
> > > row__id    {"transactionid":198994,"bucketid":0,"rowid":14}
> > > ts         1580751316822642886 (5h35)
> > > id         764925
> > >
> > > (related file: hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 )
> > > row__id    {"transactionid":198994,"bucketid":0,"rowid":14}
> > > ts         1580757191368914630 (7h13)
> > > id         764925
> > >
> > > => And I have one duplicated value if I execute a Query based on the
> id (primary key)
> > >
> > > So I don't understand because the transactionid=originalTransaction
> and all metadatas seems to be good.
> > > Probably a problem in the sort but I follow the rule that data are
> ordered by originalTransaction,bucketId,rowId ascendingly and
> currentTransaction descendingly. It works pretty well except for some
> tables with lot of updates.
> > > The only thing I can see at the moment it is the fact that I mix
> different types of operations in one bucket. The Merge query for example
> create different directories (one per operation)
> > >
> > > David
> > >
> > >
> > > On 2020/02/05 12:22:28, Peter Vary <pv...@cloudera.com> wrote:
> > >> Hi David,
> > >>
> > >> There is no tombstone for the updated record.
> > >> In ACID v2 there is no update for the rows. Only insert and delete.
> So update is handled as delete (old) row, insert (new/independent) row.
> > >> The delete is stored in the delete delta directories., and the file
> do not have to contain the {row} struct at the end.
> > >>
> > >> Hope this helps,
> > >> Peter
> > >>
> > >>> On Feb 5, 2020, at 09:39, David Morin <mo...@gmail.com>
> wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> It works pretty well but... still problems sometimes occur
> > >>> Do we have to separate operations ?
> > >>>
> > >>> Here after Orc files content:
> > >>> hive --orcfiledump
> hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000
> > >>>
> {"operation":0,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":198994,"row":{...}}
> > >>>
> > >>> hive --orcfiledump
> hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000
> > >>>
> {"operation":1,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":199073,"row":{...}}
> > >>>
> > >>> => When I execute a SELECT statement on the PK I see 2 records. The
> link is not done.
> > >>> Still, the meta looks good.
> > >>> The only difference I see is that in Orc files I mix operations.
> > >>> When I perform a MERGE statement for example 2 directories have been
> created
> > >>> hdfs://XXXX/delta_0199073_0199073_0000
> > >>> hdfs://XXXX/delta_0199073_0199073_0002
> > >>> And the first one contains updates (operation:1) and the second one,
> inserts (operation:0)
> > >>>
> > >>> Thanks for your help
> > >>> David
> > >>>
> > >>> On 2019/12/01 16:57:08, David Morin <mo...@gmail.com>
> wrote:
> > >>>> Hi Peter,
> > >>>>
> > >>>> At the moment I have a pipeline based on Flink to write Orc Files.
> These Orc Files can be read from Hive thanks to external tables and, then,
> a merge statement (triggered by oozie) push these data into tables managed
> by Hive (transactional tables => ORC). Hive version is 2.1 because this is
> the one provided by HDP 2.6.5.
> > >>>> We've developed a system that write Hive Delta Files for the
> managed tables directly from Flink.
> > >>>> The current streaming apis for Hive 2 are not suitable for our
> needs and we cannot use the new Hive 3 streaming api yet. This system uses
> the Flink state to store Hive metadata (originalTransaction, bucket, rowId,
> ..)
> > >>>> Thanks for your reply because yes, when files are ordered by
> originalTransacion, bucket, rowId
> > >>>> it works ! I just have to use 1 transaction instead of 2 at the
> moment and it will be ok.
> > >>>>
> > >>>> Thanks
> > >>>> David
> > >>>>
> > >>>> On 2019/11/29 11:18:05, Peter Vary <pv...@cloudera.com> wrote:
> > >>>>> Hi David,
> > >>>>>
> > >>>>> Not entirely sure what you are doing here :), my guess is that you
> are trying to write ACID tables outside of hive. Am I right? What is the
> exact use-case? There might be better solutions out there than writing the
> files by hand.
> > >>>>>
> > >>>>> As for your question below: Yes, the files should be ordered by:
> originalTransacion, bucket, rowId triple, otherwise you will get wrong
> results.
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Peter
> > >>>>>
> > >>>>>> On Nov 19, 2019, at 13:30, David Morin <mo...@gmail.com>
> wrote:
> > >>>>>>
> > >>>>>> here after more details about ORC content and the fact we have
> duplicate rows:
> > >>>>>>
> > >>>>>> /delta_0011365_0011365_0000/bucket_00003
> > >>>>>>
> > >>>>>>
> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11365,"row":{"TS":1574156027915254212,"cle":5218,...}}
> > >>>>>>
> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11365,"row":{"TS":1574156027915075038,"cle":5216,...}}
> > >>>>>>
> > >>>>>>
> > >>>>>> /delta_0011368_0011368_0000/bucket_00003
> > >>>>>>
> > >>>>>>
> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11368,"row":null}
> > >>>>>>
> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11368,"row":null}
> > >>>>>>
> > >>>>>> /delta_0011369_0011369_0000/bucket_00003
> > >>>>>>
> > >>>>>>
> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":1,"currentTransaction":11369,"row":{"TS":1574157407855174144,"cle":5216,...}}
> > >>>>>>
> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":0,"currentTransaction":11369,"row":{"TS":1574157407855265906,"cle":5218,...}}
> > >>>>>>
> > >>>>>> +-------------------------------------------------+-------+--+
> > >>>>>> |                     row__id                     |  cle  |
> > >>>>>> +-------------------------------------------------+-------+--+
> > >>>>>> | {"transactionid":11367,"bucketid":0,"rowid":0}  | 5209  |
> > >>>>>> | {"transactionid":11369,"bucketid":0,"rowid":0}  | 5211  |
> > >>>>>> | {"transactionid":11369,"bucketid":1,"rowid":0}  | 5210  |
> > >>>>>> | {"transactionid":11369,"bucketid":2,"rowid":0}  | 5214  |
> > >>>>>> | {"transactionid":11369,"bucketid":2,"rowid":1}  | 5215  |
> > >>>>>> | {"transactionid":11365,"bucketid":3,"rowid":0}  | 5218  |
> > >>>>>> | {"transactionid":11365,"bucketid":3,"rowid":1}  | 5216  |
> > >>>>>> | {"transactionid":11369,"bucketid":3,"rowid":1}  | 5216  |
> > >>>>>> | {"transactionid":11369,"bucketid":3,"rowid":0}  | 5218  |
> > >>>>>> | {"transactionid":11369,"bucketid":4,"rowid":0}  | 5217  |
> > >>>>>> | {"transactionid":11369,"bucketid":4,"rowid":1}  | 5213  |
> > >>>>>> | {"transactionid":11369,"bucketid":7,"rowid":0}  | 5212  |
> > >>>>>> +-------------------------------------------------+-------+--+
> > >>>>>>
> > >>>>>> As you can see we have duplicate rows for column "cle" 5216 and
> 5218
> > >>>>>> Do we have to keep the rowids ordered ? because this is the only
> difference I have noticed based on some tests with beeline.
> > >>>>>>
> > >>>>>> Thanks
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> Le mar. 19 nov. 2019 à 00:18, David Morin <
> morin.david.bzh@gmail.com <ma...@gmail.com>> a écrit :
> > >>>>>> Hello,
> > >>>>>>
> > >>>>>> I'm trying to understand the purpose of the rowid column inside
> ORC delta file
> > >>>>>> {"transactionid":11359,"bucketid":5,"rowid":0}
> > >>>>>> Orc view:
> {"operation":0,"originalTransaction":11359,"bucket":5,"rowId":0,"currentTransaction":11359,"row":...}
> > >>>>>> I use HDP 2.6 => Hive 2
> > >>>>>>
> > >>>>>> If I want to be idempotent with INSERT / DELETE / INSERT.
> > >>>>>> Do we have to keep the same rowid ?
> > >>>>>> It seems that when the rowid is changed during the second INSERT
> I have a duplicate row.
> > >>>>>> For me, I can create a new rowid for the new transaction during
> the second INSERT but that seems to generate duplicate records.
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>> David
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> > >>
> >
> >
>

Re: ORC: duplicate record - rowid meaning ?

Posted by David Morin <mo...@gmail.com>.
ok, Peter
No problem. Thx
I'll keep you in touch

On 2020/02/06 09:42:39, Peter Vary <pv...@cloudera.com> wrote: 
> Hi David,
> 
> I more familiar with ACID v2 :(
> What I would do is to run an update operation with your version of Hive and try to see how it handles this case.
> 
> Would be nice to hear back from you if you found something.
> 
> Thanks,
> Peter
> 
> > On Feb 5, 2020, at 16:55, David Morin <mo...@gmail.com> wrote:
> > 
> > Hello,
> > 
> > Thanks.
> > In fact I use HDP 2.6.5 and previous Orc version with transactionid for example and the update flag.
> > Sorry with the row__id iw would have been easier
> > 
> > So, Here after the Orc files content (with hive --orcfiledump)
> > hive --orcfiledump hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000 
> > {"operation":0,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":198994,"row":{...}}
> > 
> > hive --orcfiledump hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 
> > {"operation":1,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":199073,"row":{...}}
> > 
> > => When I execute a SELECT statement on the PK I see 2 records. The link is not done.
> > Still, the meta looks good.
> > The only difference I see is that in Orc files I mix operations.
> > When I perform a MERGE statement for example 2 directories have been created 
> > hdfs://XXXX/delta_0199073_0199073_0000
> > hdfs://XXXX/delta_0199073_0199073_0002
> > And the first one contains updates (operation:1) and the second one, inserts (operation:0)
> > 
> > And here after the row__id values
> > 
> > (related file: hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000)
> > row__id    {"transactionid":198994,"bucketid":0,"rowid":14}
> > ts         1580751316822642886 (5h35)
> > id         764925
> > 
> > (related file: hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 )
> > row__id    {"transactionid":198994,"bucketid":0,"rowid":14}
> > ts         1580757191368914630 (7h13)
> > id         764925
> > 
> > => And I have one duplicated value if I execute a Query based on the id (primary key)
> > 
> > So I don't understand because the transactionid=originalTransaction and all metadatas seems to be good.
> > Probably a problem in the sort but I follow the rule that data are ordered by originalTransaction,bucketId,rowId ascendingly and currentTransaction descendingly. It works pretty well except for some tables with lot of updates.
> > The only thing I can see at the moment it is the fact that I mix different types of operations in one bucket. The Merge query for example create different directories (one per operation)
> > 
> > David
> > 
> > 
> > On 2020/02/05 12:22:28, Peter Vary <pv...@cloudera.com> wrote: 
> >> Hi David,
> >> 
> >> There is no tombstone for the updated record.
> >> In ACID v2 there is no update for the rows. Only insert and delete. So update is handled as delete (old) row, insert (new/independent) row.
> >> The delete is stored in the delete delta directories., and the file do not have to contain the {row} struct at the end.
> >> 
> >> Hope this helps,
> >> Peter
> >> 
> >>> On Feb 5, 2020, at 09:39, David Morin <mo...@gmail.com> wrote:
> >>> 
> >>> Hi,
> >>> 
> >>> It works pretty well but... still problems sometimes occur
> >>> Do we have to separate operations ?
> >>> 
> >>> Here after Orc files content:
> >>> hive --orcfiledump hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000 
> >>> {"operation":0,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":198994,"row":{...}}
> >>> 
> >>> hive --orcfiledump hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 
> >>> {"operation":1,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":199073,"row":{...}}
> >>> 
> >>> => When I execute a SELECT statement on the PK I see 2 records. The link is not done.
> >>> Still, the meta looks good.
> >>> The only difference I see is that in Orc files I mix operations.
> >>> When I perform a MERGE statement for example 2 directories have been created 
> >>> hdfs://XXXX/delta_0199073_0199073_0000
> >>> hdfs://XXXX/delta_0199073_0199073_0002
> >>> And the first one contains updates (operation:1) and the second one, inserts (operation:0)
> >>> 
> >>> Thanks for your help
> >>> David
> >>> 
> >>> On 2019/12/01 16:57:08, David Morin <mo...@gmail.com> wrote: 
> >>>> Hi Peter,
> >>>> 
> >>>> At the moment I have a pipeline based on Flink to write Orc Files. These Orc Files can be read from Hive thanks to external tables and, then, a merge statement (triggered by oozie) push these data into tables managed by Hive (transactional tables => ORC). Hive version is 2.1 because this is the one provided by HDP 2.6.5.
> >>>> We've developed a system that write Hive Delta Files for the managed tables directly from Flink.
> >>>> The current streaming apis for Hive 2 are not suitable for our needs and we cannot use the new Hive 3 streaming api yet. This system uses the Flink state to store Hive metadata (originalTransaction, bucket, rowId, ..)
> >>>> Thanks for your reply because yes, when files are ordered by originalTransacion, bucket, rowId
> >>>> it works ! I just have to use 1 transaction instead of 2 at the moment and it will be ok.
> >>>> 
> >>>> Thanks
> >>>> David
> >>>> 
> >>>> On 2019/11/29 11:18:05, Peter Vary <pv...@cloudera.com> wrote: 
> >>>>> Hi David,
> >>>>> 
> >>>>> Not entirely sure what you are doing here :), my guess is that you are trying to write ACID tables outside of hive. Am I right? What is the exact use-case? There might be better solutions out there than writing the files by hand.
> >>>>> 
> >>>>> As for your question below: Yes, the files should be ordered by: originalTransacion, bucket, rowId triple, otherwise you will get wrong results.
> >>>>> 
> >>>>> Thanks,
> >>>>> Peter
> >>>>> 
> >>>>>> On Nov 19, 2019, at 13:30, David Morin <mo...@gmail.com> wrote:
> >>>>>> 
> >>>>>> here after more details about ORC content and the fact we have duplicate rows:
> >>>>>> 
> >>>>>> /delta_0011365_0011365_0000/bucket_00003
> >>>>>> 
> >>>>>> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11365,"row":{"TS":1574156027915254212,"cle":5218,...}}
> >>>>>> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11365,"row":{"TS":1574156027915075038,"cle":5216,...}}
> >>>>>> 
> >>>>>> 
> >>>>>> /delta_0011368_0011368_0000/bucket_00003
> >>>>>> 
> >>>>>> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11368,"row":null}
> >>>>>> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11368,"row":null}
> >>>>>> 
> >>>>>> /delta_0011369_0011369_0000/bucket_00003
> >>>>>> 
> >>>>>> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":1,"currentTransaction":11369,"row":{"TS":1574157407855174144,"cle":5216,...}}
> >>>>>> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":0,"currentTransaction":11369,"row":{"TS":1574157407855265906,"cle":5218,...}}
> >>>>>> 
> >>>>>> +-------------------------------------------------+-------+--+
> >>>>>> |                     row__id                     |  cle  |
> >>>>>> +-------------------------------------------------+-------+--+
> >>>>>> | {"transactionid":11367,"bucketid":0,"rowid":0}  | 5209  |
> >>>>>> | {"transactionid":11369,"bucketid":0,"rowid":0}  | 5211  |
> >>>>>> | {"transactionid":11369,"bucketid":1,"rowid":0}  | 5210  |
> >>>>>> | {"transactionid":11369,"bucketid":2,"rowid":0}  | 5214  |
> >>>>>> | {"transactionid":11369,"bucketid":2,"rowid":1}  | 5215  |
> >>>>>> | {"transactionid":11365,"bucketid":3,"rowid":0}  | 5218  |
> >>>>>> | {"transactionid":11365,"bucketid":3,"rowid":1}  | 5216  |
> >>>>>> | {"transactionid":11369,"bucketid":3,"rowid":1}  | 5216  |
> >>>>>> | {"transactionid":11369,"bucketid":3,"rowid":0}  | 5218  |
> >>>>>> | {"transactionid":11369,"bucketid":4,"rowid":0}  | 5217  |
> >>>>>> | {"transactionid":11369,"bucketid":4,"rowid":1}  | 5213  |
> >>>>>> | {"transactionid":11369,"bucketid":7,"rowid":0}  | 5212  |
> >>>>>> +-------------------------------------------------+-------+--+
> >>>>>> 
> >>>>>> As you can see we have duplicate rows for column "cle" 5216 and 5218
> >>>>>> Do we have to keep the rowids ordered ? because this is the only difference I have noticed based on some tests with beeline.
> >>>>>> 
> >>>>>> Thanks
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> Le mar. 19 nov. 2019 à 00:18, David Morin <morin.david.bzh@gmail.com <ma...@gmail.com>> a écrit :
> >>>>>> Hello,
> >>>>>> 
> >>>>>> I'm trying to understand the purpose of the rowid column inside ORC delta file
> >>>>>> {"transactionid":11359,"bucketid":5,"rowid":0}
> >>>>>> Orc view: {"operation":0,"originalTransaction":11359,"bucket":5,"rowId":0,"currentTransaction":11359,"row":...}
> >>>>>> I use HDP 2.6 => Hive 2
> >>>>>> 
> >>>>>> If I want to be idempotent with INSERT / DELETE / INSERT. 
> >>>>>> Do we have to keep the same rowid ?
> >>>>>> It seems that when the rowid is changed during the second INSERT I have a duplicate row.
> >>>>>> For me, I can create a new rowid for the new transaction during the second INSERT but that seems to generate duplicate records.
> >>>>>> 
> >>>>>> Regards,
> >>>>>> David
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>> 
> >>>>> 
> >>>> 
> >> 
> >> 
> 
> 

Re: ORC: duplicate record - rowid meaning ?

Posted by Peter Vary <pv...@cloudera.com>.
Hi David,

I more familiar with ACID v2 :(
What I would do is to run an update operation with your version of Hive and try to see how it handles this case.

Would be nice to hear back from you if you found something.

Thanks,
Peter

> On Feb 5, 2020, at 16:55, David Morin <mo...@gmail.com> wrote:
> 
> Hello,
> 
> Thanks.
> In fact I use HDP 2.6.5 and previous Orc version with transactionid for example and the update flag.
> Sorry with the row__id iw would have been easier
> 
> So, Here after the Orc files content (with hive --orcfiledump)
> hive --orcfiledump hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000 
> {"operation":0,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":198994,"row":{...}}
> 
> hive --orcfiledump hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 
> {"operation":1,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":199073,"row":{...}}
> 
> => When I execute a SELECT statement on the PK I see 2 records. The link is not done.
> Still, the meta looks good.
> The only difference I see is that in Orc files I mix operations.
> When I perform a MERGE statement for example 2 directories have been created 
> hdfs://XXXX/delta_0199073_0199073_0000
> hdfs://XXXX/delta_0199073_0199073_0002
> And the first one contains updates (operation:1) and the second one, inserts (operation:0)
> 
> And here after the row__id values
> 
> (related file: hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000)
> row__id    {"transactionid":198994,"bucketid":0,"rowid":14}
> ts         1580751316822642886 (5h35)
> id         764925
> 
> (related file: hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 )
> row__id    {"transactionid":198994,"bucketid":0,"rowid":14}
> ts         1580757191368914630 (7h13)
> id         764925
> 
> => And I have one duplicated value if I execute a Query based on the id (primary key)
> 
> So I don't understand because the transactionid=originalTransaction and all metadatas seems to be good.
> Probably a problem in the sort but I follow the rule that data are ordered by originalTransaction,bucketId,rowId ascendingly and currentTransaction descendingly. It works pretty well except for some tables with lot of updates.
> The only thing I can see at the moment it is the fact that I mix different types of operations in one bucket. The Merge query for example create different directories (one per operation)
> 
> David
> 
> 
> On 2020/02/05 12:22:28, Peter Vary <pv...@cloudera.com> wrote: 
>> Hi David,
>> 
>> There is no tombstone for the updated record.
>> In ACID v2 there is no update for the rows. Only insert and delete. So update is handled as delete (old) row, insert (new/independent) row.
>> The delete is stored in the delete delta directories., and the file do not have to contain the {row} struct at the end.
>> 
>> Hope this helps,
>> Peter
>> 
>>> On Feb 5, 2020, at 09:39, David Morin <mo...@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> It works pretty well but... still problems sometimes occur
>>> Do we have to separate operations ?
>>> 
>>> Here after Orc files content:
>>> hive --orcfiledump hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000 
>>> {"operation":0,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":198994,"row":{...}}
>>> 
>>> hive --orcfiledump hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 
>>> {"operation":1,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":199073,"row":{...}}
>>> 
>>> => When I execute a SELECT statement on the PK I see 2 records. The link is not done.
>>> Still, the meta looks good.
>>> The only difference I see is that in Orc files I mix operations.
>>> When I perform a MERGE statement for example 2 directories have been created 
>>> hdfs://XXXX/delta_0199073_0199073_0000
>>> hdfs://XXXX/delta_0199073_0199073_0002
>>> And the first one contains updates (operation:1) and the second one, inserts (operation:0)
>>> 
>>> Thanks for your help
>>> David
>>> 
>>> On 2019/12/01 16:57:08, David Morin <mo...@gmail.com> wrote: 
>>>> Hi Peter,
>>>> 
>>>> At the moment I have a pipeline based on Flink to write Orc Files. These Orc Files can be read from Hive thanks to external tables and, then, a merge statement (triggered by oozie) push these data into tables managed by Hive (transactional tables => ORC). Hive version is 2.1 because this is the one provided by HDP 2.6.5.
>>>> We've developed a system that write Hive Delta Files for the managed tables directly from Flink.
>>>> The current streaming apis for Hive 2 are not suitable for our needs and we cannot use the new Hive 3 streaming api yet. This system uses the Flink state to store Hive metadata (originalTransaction, bucket, rowId, ..)
>>>> Thanks for your reply because yes, when files are ordered by originalTransacion, bucket, rowId
>>>> it works ! I just have to use 1 transaction instead of 2 at the moment and it will be ok.
>>>> 
>>>> Thanks
>>>> David
>>>> 
>>>> On 2019/11/29 11:18:05, Peter Vary <pv...@cloudera.com> wrote: 
>>>>> Hi David,
>>>>> 
>>>>> Not entirely sure what you are doing here :), my guess is that you are trying to write ACID tables outside of hive. Am I right? What is the exact use-case? There might be better solutions out there than writing the files by hand.
>>>>> 
>>>>> As for your question below: Yes, the files should be ordered by: originalTransacion, bucket, rowId triple, otherwise you will get wrong results.
>>>>> 
>>>>> Thanks,
>>>>> Peter
>>>>> 
>>>>>> On Nov 19, 2019, at 13:30, David Morin <mo...@gmail.com> wrote:
>>>>>> 
>>>>>> here after more details about ORC content and the fact we have duplicate rows:
>>>>>> 
>>>>>> /delta_0011365_0011365_0000/bucket_00003
>>>>>> 
>>>>>> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11365,"row":{"TS":1574156027915254212,"cle":5218,...}}
>>>>>> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11365,"row":{"TS":1574156027915075038,"cle":5216,...}}
>>>>>> 
>>>>>> 
>>>>>> /delta_0011368_0011368_0000/bucket_00003
>>>>>> 
>>>>>> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11368,"row":null}
>>>>>> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11368,"row":null}
>>>>>> 
>>>>>> /delta_0011369_0011369_0000/bucket_00003
>>>>>> 
>>>>>> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":1,"currentTransaction":11369,"row":{"TS":1574157407855174144,"cle":5216,...}}
>>>>>> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":0,"currentTransaction":11369,"row":{"TS":1574157407855265906,"cle":5218,...}}
>>>>>> 
>>>>>> +-------------------------------------------------+-------+--+
>>>>>> |                     row__id                     |  cle  |
>>>>>> +-------------------------------------------------+-------+--+
>>>>>> | {"transactionid":11367,"bucketid":0,"rowid":0}  | 5209  |
>>>>>> | {"transactionid":11369,"bucketid":0,"rowid":0}  | 5211  |
>>>>>> | {"transactionid":11369,"bucketid":1,"rowid":0}  | 5210  |
>>>>>> | {"transactionid":11369,"bucketid":2,"rowid":0}  | 5214  |
>>>>>> | {"transactionid":11369,"bucketid":2,"rowid":1}  | 5215  |
>>>>>> | {"transactionid":11365,"bucketid":3,"rowid":0}  | 5218  |
>>>>>> | {"transactionid":11365,"bucketid":3,"rowid":1}  | 5216  |
>>>>>> | {"transactionid":11369,"bucketid":3,"rowid":1}  | 5216  |
>>>>>> | {"transactionid":11369,"bucketid":3,"rowid":0}  | 5218  |
>>>>>> | {"transactionid":11369,"bucketid":4,"rowid":0}  | 5217  |
>>>>>> | {"transactionid":11369,"bucketid":4,"rowid":1}  | 5213  |
>>>>>> | {"transactionid":11369,"bucketid":7,"rowid":0}  | 5212  |
>>>>>> +-------------------------------------------------+-------+--+
>>>>>> 
>>>>>> As you can see we have duplicate rows for column "cle" 5216 and 5218
>>>>>> Do we have to keep the rowids ordered ? because this is the only difference I have noticed based on some tests with beeline.
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Le mar. 19 nov. 2019 à 00:18, David Morin <morin.david.bzh@gmail.com <ma...@gmail.com>> a écrit :
>>>>>> Hello,
>>>>>> 
>>>>>> I'm trying to understand the purpose of the rowid column inside ORC delta file
>>>>>> {"transactionid":11359,"bucketid":5,"rowid":0}
>>>>>> Orc view: {"operation":0,"originalTransaction":11359,"bucket":5,"rowId":0,"currentTransaction":11359,"row":...}
>>>>>> I use HDP 2.6 => Hive 2
>>>>>> 
>>>>>> If I want to be idempotent with INSERT / DELETE / INSERT. 
>>>>>> Do we have to keep the same rowid ?
>>>>>> It seems that when the rowid is changed during the second INSERT I have a duplicate row.
>>>>>> For me, I can create a new rowid for the new transaction during the second INSERT but that seems to generate duplicate records.
>>>>>> 
>>>>>> Regards,
>>>>>> David
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>> 


Re: ORC: duplicate record - rowid meaning ?

Posted by David Morin <mo...@gmail.com>.
Hello,

Thanks.
In fact I use HDP 2.6.5 and previous Orc version with transactionid for example and the update flag.
Sorry with the row__id iw would have been easier

So, Here after the Orc files content (with hive --orcfiledump)
hive --orcfiledump hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000 
{"operation":0,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":198994,"row":{...}}

hive --orcfiledump hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 
{"operation":1,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":199073,"row":{...}}

=> When I execute a SELECT statement on the PK I see 2 records. The link is not done.
Still, the meta looks good.
The only difference I see is that in Orc files I mix operations.
When I perform a MERGE statement for example 2 directories have been created 
 hdfs://XXXX/delta_0199073_0199073_0000
 hdfs://XXXX/delta_0199073_0199073_0002
And the first one contains updates (operation:1) and the second one, inserts (operation:0)

And here after the row__id values

(related file: hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000)
row__id    {"transactionid":198994,"bucketid":0,"rowid":14}
ts         1580751316822642886 (5h35)
id         764925

(related file: hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 )
row__id    {"transactionid":198994,"bucketid":0,"rowid":14}
ts         1580757191368914630 (7h13)
id         764925

=> And I have one duplicated value if I execute a Query based on the id (primary key)

So I don't understand because the transactionid=originalTransaction and all metadatas seems to be good.
Probably a problem in the sort but I follow the rule that data are ordered by originalTransaction,bucketId,rowId ascendingly and currentTransaction descendingly. It works pretty well except for some tables with lot of updates.
The only thing I can see at the moment it is the fact that I mix different types of operations in one bucket. The Merge query for example create different directories (one per operation)

David


On 2020/02/05 12:22:28, Peter Vary <pv...@cloudera.com> wrote: 
> Hi David,
> 
> There is no tombstone for the updated record.
> In ACID v2 there is no update for the rows. Only insert and delete. So update is handled as delete (old) row, insert (new/independent) row.
> The delete is stored in the delete delta directories., and the file do not have to contain the {row} struct at the end.
> 
> Hope this helps,
> Peter
> 
> > On Feb 5, 2020, at 09:39, David Morin <mo...@gmail.com> wrote:
> > 
> > Hi,
> > 
> > It works pretty well but... still problems sometimes occur
> > Do we have to separate operations ?
> > 
> > Here after Orc files content:
> > hive --orcfiledump hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000 
> > {"operation":0,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":198994,"row":{...}}
> > 
> > hive --orcfiledump hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 
> > {"operation":1,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":199073,"row":{...}}
> > 
> > => When I execute a SELECT statement on the PK I see 2 records. The link is not done.
> > Still, the meta looks good.
> > The only difference I see is that in Orc files I mix operations.
> > When I perform a MERGE statement for example 2 directories have been created 
> > hdfs://XXXX/delta_0199073_0199073_0000
> > hdfs://XXXX/delta_0199073_0199073_0002
> > And the first one contains updates (operation:1) and the second one, inserts (operation:0)
> > 
> > Thanks for your help
> > David
> > 
> > On 2019/12/01 16:57:08, David Morin <mo...@gmail.com> wrote: 
> >> Hi Peter,
> >> 
> >> At the moment I have a pipeline based on Flink to write Orc Files. These Orc Files can be read from Hive thanks to external tables and, then, a merge statement (triggered by oozie) push these data into tables managed by Hive (transactional tables => ORC). Hive version is 2.1 because this is the one provided by HDP 2.6.5.
> >> We've developed a system that write Hive Delta Files for the managed tables directly from Flink.
> >> The current streaming apis for Hive 2 are not suitable for our needs and we cannot use the new Hive 3 streaming api yet. This system uses the Flink state to store Hive metadata (originalTransaction, bucket, rowId, ..)
> >> Thanks for your reply because yes, when files are ordered by originalTransacion, bucket, rowId
> >> it works ! I just have to use 1 transaction instead of 2 at the moment and it will be ok.
> >> 
> >> Thanks
> >> David
> >> 
> >> On 2019/11/29 11:18:05, Peter Vary <pv...@cloudera.com> wrote: 
> >>> Hi David,
> >>> 
> >>> Not entirely sure what you are doing here :), my guess is that you are trying to write ACID tables outside of hive. Am I right? What is the exact use-case? There might be better solutions out there than writing the files by hand.
> >>> 
> >>> As for your question below: Yes, the files should be ordered by: originalTransacion, bucket, rowId triple, otherwise you will get wrong results.
> >>> 
> >>> Thanks,
> >>> Peter
> >>> 
> >>>> On Nov 19, 2019, at 13:30, David Morin <mo...@gmail.com> wrote:
> >>>> 
> >>>> here after more details about ORC content and the fact we have duplicate rows:
> >>>> 
> >>>> /delta_0011365_0011365_0000/bucket_00003
> >>>> 
> >>>> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11365,"row":{"TS":1574156027915254212,"cle":5218,...}}
> >>>> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11365,"row":{"TS":1574156027915075038,"cle":5216,...}}
> >>>> 
> >>>> 
> >>>> /delta_0011368_0011368_0000/bucket_00003
> >>>> 
> >>>> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11368,"row":null}
> >>>> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11368,"row":null}
> >>>> 
> >>>> /delta_0011369_0011369_0000/bucket_00003
> >>>> 
> >>>> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":1,"currentTransaction":11369,"row":{"TS":1574157407855174144,"cle":5216,...}}
> >>>> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":0,"currentTransaction":11369,"row":{"TS":1574157407855265906,"cle":5218,...}}
> >>>> 
> >>>> +-------------------------------------------------+-------+--+
> >>>> |                     row__id                     |  cle  |
> >>>> +-------------------------------------------------+-------+--+
> >>>> | {"transactionid":11367,"bucketid":0,"rowid":0}  | 5209  |
> >>>> | {"transactionid":11369,"bucketid":0,"rowid":0}  | 5211  |
> >>>> | {"transactionid":11369,"bucketid":1,"rowid":0}  | 5210  |
> >>>> | {"transactionid":11369,"bucketid":2,"rowid":0}  | 5214  |
> >>>> | {"transactionid":11369,"bucketid":2,"rowid":1}  | 5215  |
> >>>> | {"transactionid":11365,"bucketid":3,"rowid":0}  | 5218  |
> >>>> | {"transactionid":11365,"bucketid":3,"rowid":1}  | 5216  |
> >>>> | {"transactionid":11369,"bucketid":3,"rowid":1}  | 5216  |
> >>>> | {"transactionid":11369,"bucketid":3,"rowid":0}  | 5218  |
> >>>> | {"transactionid":11369,"bucketid":4,"rowid":0}  | 5217  |
> >>>> | {"transactionid":11369,"bucketid":4,"rowid":1}  | 5213  |
> >>>> | {"transactionid":11369,"bucketid":7,"rowid":0}  | 5212  |
> >>>> +-------------------------------------------------+-------+--+
> >>>> 
> >>>> As you can see we have duplicate rows for column "cle" 5216 and 5218
> >>>> Do we have to keep the rowids ordered ? because this is the only difference I have noticed based on some tests with beeline.
> >>>> 
> >>>> Thanks
> >>>> 
> >>>> 
> >>>> 
> >>>> Le mar. 19 nov. 2019 à 00:18, David Morin <morin.david.bzh@gmail.com <ma...@gmail.com>> a écrit :
> >>>> Hello,
> >>>> 
> >>>> I'm trying to understand the purpose of the rowid column inside ORC delta file
> >>>> {"transactionid":11359,"bucketid":5,"rowid":0}
> >>>> Orc view: {"operation":0,"originalTransaction":11359,"bucket":5,"rowId":0,"currentTransaction":11359,"row":...}
> >>>> I use HDP 2.6 => Hive 2
> >>>> 
> >>>> If I want to be idempotent with INSERT / DELETE / INSERT. 
> >>>> Do we have to keep the same rowid ?
> >>>> It seems that when the rowid is changed during the second INSERT I have a duplicate row.
> >>>> For me, I can create a new rowid for the new transaction during the second INSERT but that seems to generate duplicate records.
> >>>> 
> >>>> Regards,
> >>>> David
> >>>> 
> >>>> 
> >>>> 
> >>> 
> >>> 
> >> 
> 
> 

Re: ORC: duplicate record - rowid meaning ?

Posted by Peter Vary <pv...@cloudera.com>.
Hi David,

There is no tombstone for the updated record.
In ACID v2 there is no update for the rows. Only insert and delete. So update is handled as delete (old) row, insert (new/independent) row.
The delete is stored in the delete delta directories., and the file do not have to contain the {row} struct at the end.

Hope this helps,
Peter

> On Feb 5, 2020, at 09:39, David Morin <mo...@gmail.com> wrote:
> 
> Hi,
> 
> It works pretty well but... still problems sometimes occur
> Do we have to separate operations ?
> 
> Here after Orc files content:
> hive --orcfiledump hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000 
> {"operation":0,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":198994,"row":{...}}
> 
> hive --orcfiledump hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 
> {"operation":1,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":199073,"row":{...}}
> 
> => When I execute a SELECT statement on the PK I see 2 records. The link is not done.
> Still, the meta looks good.
> The only difference I see is that in Orc files I mix operations.
> When I perform a MERGE statement for example 2 directories have been created 
> hdfs://XXXX/delta_0199073_0199073_0000
> hdfs://XXXX/delta_0199073_0199073_0002
> And the first one contains updates (operation:1) and the second one, inserts (operation:0)
> 
> Thanks for your help
> David
> 
> On 2019/12/01 16:57:08, David Morin <mo...@gmail.com> wrote: 
>> Hi Peter,
>> 
>> At the moment I have a pipeline based on Flink to write Orc Files. These Orc Files can be read from Hive thanks to external tables and, then, a merge statement (triggered by oozie) push these data into tables managed by Hive (transactional tables => ORC). Hive version is 2.1 because this is the one provided by HDP 2.6.5.
>> We've developed a system that write Hive Delta Files for the managed tables directly from Flink.
>> The current streaming apis for Hive 2 are not suitable for our needs and we cannot use the new Hive 3 streaming api yet. This system uses the Flink state to store Hive metadata (originalTransaction, bucket, rowId, ..)
>> Thanks for your reply because yes, when files are ordered by originalTransacion, bucket, rowId
>> it works ! I just have to use 1 transaction instead of 2 at the moment and it will be ok.
>> 
>> Thanks
>> David
>> 
>> On 2019/11/29 11:18:05, Peter Vary <pv...@cloudera.com> wrote: 
>>> Hi David,
>>> 
>>> Not entirely sure what you are doing here :), my guess is that you are trying to write ACID tables outside of hive. Am I right? What is the exact use-case? There might be better solutions out there than writing the files by hand.
>>> 
>>> As for your question below: Yes, the files should be ordered by: originalTransacion, bucket, rowId triple, otherwise you will get wrong results.
>>> 
>>> Thanks,
>>> Peter
>>> 
>>>> On Nov 19, 2019, at 13:30, David Morin <mo...@gmail.com> wrote:
>>>> 
>>>> here after more details about ORC content and the fact we have duplicate rows:
>>>> 
>>>> /delta_0011365_0011365_0000/bucket_00003
>>>> 
>>>> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11365,"row":{"TS":1574156027915254212,"cle":5218,...}}
>>>> {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11365,"row":{"TS":1574156027915075038,"cle":5216,...}}
>>>> 
>>>> 
>>>> /delta_0011368_0011368_0000/bucket_00003
>>>> 
>>>> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11368,"row":null}
>>>> {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11368,"row":null}
>>>> 
>>>> /delta_0011369_0011369_0000/bucket_00003
>>>> 
>>>> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":1,"currentTransaction":11369,"row":{"TS":1574157407855174144,"cle":5216,...}}
>>>> {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":0,"currentTransaction":11369,"row":{"TS":1574157407855265906,"cle":5218,...}}
>>>> 
>>>> +-------------------------------------------------+-------+--+
>>>> |                     row__id                     |  cle  |
>>>> +-------------------------------------------------+-------+--+
>>>> | {"transactionid":11367,"bucketid":0,"rowid":0}  | 5209  |
>>>> | {"transactionid":11369,"bucketid":0,"rowid":0}  | 5211  |
>>>> | {"transactionid":11369,"bucketid":1,"rowid":0}  | 5210  |
>>>> | {"transactionid":11369,"bucketid":2,"rowid":0}  | 5214  |
>>>> | {"transactionid":11369,"bucketid":2,"rowid":1}  | 5215  |
>>>> | {"transactionid":11365,"bucketid":3,"rowid":0}  | 5218  |
>>>> | {"transactionid":11365,"bucketid":3,"rowid":1}  | 5216  |
>>>> | {"transactionid":11369,"bucketid":3,"rowid":1}  | 5216  |
>>>> | {"transactionid":11369,"bucketid":3,"rowid":0}  | 5218  |
>>>> | {"transactionid":11369,"bucketid":4,"rowid":0}  | 5217  |
>>>> | {"transactionid":11369,"bucketid":4,"rowid":1}  | 5213  |
>>>> | {"transactionid":11369,"bucketid":7,"rowid":0}  | 5212  |
>>>> +-------------------------------------------------+-------+--+
>>>> 
>>>> As you can see we have duplicate rows for column "cle" 5216 and 5218
>>>> Do we have to keep the rowids ordered ? because this is the only difference I have noticed based on some tests with beeline.
>>>> 
>>>> Thanks
>>>> 
>>>> 
>>>> 
>>>> Le mar. 19 nov. 2019 à 00:18, David Morin <morin.david.bzh@gmail.com <ma...@gmail.com>> a écrit :
>>>> Hello,
>>>> 
>>>> I'm trying to understand the purpose of the rowid column inside ORC delta file
>>>> {"transactionid":11359,"bucketid":5,"rowid":0}
>>>> Orc view: {"operation":0,"originalTransaction":11359,"bucket":5,"rowId":0,"currentTransaction":11359,"row":...}
>>>> I use HDP 2.6 => Hive 2
>>>> 
>>>> If I want to be idempotent with INSERT / DELETE / INSERT. 
>>>> Do we have to keep the same rowid ?
>>>> It seems that when the rowid is changed during the second INSERT I have a duplicate row.
>>>> For me, I can create a new rowid for the new transaction during the second INSERT but that seems to generate duplicate records.
>>>> 
>>>> Regards,
>>>> David
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 


Re: ORC: duplicate record - rowid meaning ?

Posted by David Morin <mo...@gmail.com>.
Hi,

It works pretty well but... still problems sometimes occur
Do we have to separate operations ?

Here after Orc files content:
hive --orcfiledump hdfs://XXXX/delta_0198994_0198994_0000/bucket_00000 
{"operation":0,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":198994,"row":{...}}

hive --orcfiledump hdfs://XXXX/delta_0199073_0199073_0000/bucket_00000 
{"operation":1,"originalTransaction":198994,"bucket":0,"rowId":14,"currentTransaction":199073,"row":{...}}

=> When I execute a SELECT statement on the PK I see 2 records. The link is not done.
Still, the meta looks good.
The only difference I see is that in Orc files I mix operations.
When I perform a MERGE statement for example 2 directories have been created 
 hdfs://XXXX/delta_0199073_0199073_0000
 hdfs://XXXX/delta_0199073_0199073_0002
And the first one contains updates (operation:1) and the second one, inserts (operation:0)

Thanks for your help
David

On 2019/12/01 16:57:08, David Morin <mo...@gmail.com> wrote: 
> Hi Peter,
> 
> At the moment I have a pipeline based on Flink to write Orc Files. These Orc Files can be read from Hive thanks to external tables and, then, a merge statement (triggered by oozie) push these data into tables managed by Hive (transactional tables => ORC). Hive version is 2.1 because this is the one provided by HDP 2.6.5.
> We've developed a system that write Hive Delta Files for the managed tables directly from Flink.
> The current streaming apis for Hive 2 are not suitable for our needs and we cannot use the new Hive 3 streaming api yet. This system uses the Flink state to store Hive metadata (originalTransaction, bucket, rowId, ..)
> Thanks for your reply because yes, when files are ordered by originalTransacion, bucket, rowId
> it works ! I just have to use 1 transaction instead of 2 at the moment and it will be ok.
> 
> Thanks
> David
> 
> On 2019/11/29 11:18:05, Peter Vary <pv...@cloudera.com> wrote: 
> > Hi David,
> > 
> > Not entirely sure what you are doing here :), my guess is that you are trying to write ACID tables outside of hive. Am I right? What is the exact use-case? There might be better solutions out there than writing the files by hand.
> > 
> > As for your question below: Yes, the files should be ordered by: originalTransacion, bucket, rowId triple, otherwise you will get wrong results.
> > 
> > Thanks,
> > Peter
> > 
> > > On Nov 19, 2019, at 13:30, David Morin <mo...@gmail.com> wrote:
> > > 
> > > here after more details about ORC content and the fact we have duplicate rows:
> > > 
> > > /delta_0011365_0011365_0000/bucket_00003
> > > 
> > > {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11365,"row":{"TS":1574156027915254212,"cle":5218,...}}
> > > {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11365,"row":{"TS":1574156027915075038,"cle":5216,...}}
> > > 
> > > 
> > > /delta_0011368_0011368_0000/bucket_00003
> > > 
> > > {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":1,"currentTransaction":11368,"row":null}
> > > {"operation":2,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11368,"row":null}
> > > 
> > > /delta_0011369_0011369_0000/bucket_00003
> > > 
> > > {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":1,"currentTransaction":11369,"row":{"TS":1574157407855174144,"cle":5216,...}}
> > > {"operation":0,"originalTransaction":11369,"bucket":3,"rowId":0,"currentTransaction":11369,"row":{"TS":1574157407855265906,"cle":5218,...}}
> > > 
> > > +-------------------------------------------------+-------+--+
> > > |                     row__id                     |  cle  |
> > > +-------------------------------------------------+-------+--+
> > > | {"transactionid":11367,"bucketid":0,"rowid":0}  | 5209  |
> > > | {"transactionid":11369,"bucketid":0,"rowid":0}  | 5211  |
> > > | {"transactionid":11369,"bucketid":1,"rowid":0}  | 5210  |
> > > | {"transactionid":11369,"bucketid":2,"rowid":0}  | 5214  |
> > > | {"transactionid":11369,"bucketid":2,"rowid":1}  | 5215  |
> > > | {"transactionid":11365,"bucketid":3,"rowid":0}  | 5218  |
> > > | {"transactionid":11365,"bucketid":3,"rowid":1}  | 5216  |
> > > | {"transactionid":11369,"bucketid":3,"rowid":1}  | 5216  |
> > > | {"transactionid":11369,"bucketid":3,"rowid":0}  | 5218  |
> > > | {"transactionid":11369,"bucketid":4,"rowid":0}  | 5217  |
> > > | {"transactionid":11369,"bucketid":4,"rowid":1}  | 5213  |
> > > | {"transactionid":11369,"bucketid":7,"rowid":0}  | 5212  |
> > > +-------------------------------------------------+-------+--+
> > > 
> > > As you can see we have duplicate rows for column "cle" 5216 and 5218
> > > Do we have to keep the rowids ordered ? because this is the only difference I have noticed based on some tests with beeline.
> > > 
> > > Thanks
> > > 
> > > 
> > > 
> > > Le mar. 19 nov. 2019 à 00:18, David Morin <morin.david.bzh@gmail.com <ma...@gmail.com>> a écrit :
> > > Hello,
> > > 
> > > I'm trying to understand the purpose of the rowid column inside ORC delta file
> > > {"transactionid":11359,"bucketid":5,"rowid":0}
> > > Orc view: {"operation":0,"originalTransaction":11359,"bucket":5,"rowId":0,"currentTransaction":11359,"row":...}
> > > I use HDP 2.6 => Hive 2
> > > 
> > > If I want to be idempotent with INSERT / DELETE / INSERT. 
> > > Do we have to keep the same rowid ?
> > > It seems that when the rowid is changed during the second INSERT I have a duplicate row.
> > > For me, I can create a new rowid for the new transaction during the second INSERT but that seems to generate duplicate records.
> > > 
> > > Regards,
> > > David
> > > 
> > > 
> > > 
> > 
> > 
>