You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by Granville Barnett <gr...@gmail.com> on 2018/11/23 12:24:15 UTC

When do the deltas of a transaction become observable?

Hi All,

I'm trying to figure out where in the Hive codebase that all deltas that
are the side effect of a Hive 3.x transaction become observable. (My
current investigation is for HDFS.)

For example,

from table1
insert into table2 select x
insert into table3 select x;

This transaction generates two delta files: one that will be appear under
the location for table2 and another under the location for table3.

I'm expecting that there's some logic that will make the deltas of this
transaction appear in their respective HDFS locations upon commit (or
release of the locks) but I can't seem to find it. As it's a transactional
system I'd expect we observe both deltas or none at all, at the point of
successful commit.

The only reference to a location I've managed to stumble across is that of
the Hive scratch space: conceptually, I had thought that the intermediate
result of a transaction would be located here and then a rename would occur
to make the content visible to other readers.

I had done some basic tests to determine if the observation semantics were
tied to the metadata in the database product for the transactional system
but I could only determine write IDs were influencing this, e.g. if write
ID = 7 for a given table, then the read would consist of all deltas with a
write ID < 7.

If someone could point me in the right direction, or correct my
understanding then I would greatly appreciate it.

Thanks,

Granville

Re: When do the deltas of a transaction become observable?

Posted by Granville Barnett <gr...@gmail.com>.

Thanks Gopal, that was very helpful.

Granville

On Mon, 26 Nov 2018 at 08:14, Gopal Vijayaraghavan <go...@apache.org>
wrote:

>
> >    release of the locks) but I can't seem to find it. As it's a
> transactional
> >    system I'd expect we observe both deltas or none at all, at the point
> of
> >    successful commit.
>
> In Hive's internals, "observe" is slightly different from "use". Hive ACID
> system
> can see a file on HDFS and then ignore it, because it is from the
> "future".
>
> You can sort of start from this line
>
>
> https://github.com/apache/hive/blob/master/storage-api/src/java/org/apache/hadoop/hive/common/ValidReaderWriteIdList.java#L70
>
> and work backwards.
>
> >    I had done some basic tests to determine if the observation semantics
> were
> >    tied to the metadata in the database product for the transactional
> system
> >    but I could only determine write IDs were influencing this, e.g. if
> write
> >    ID = 7 for a given table, then the read would consist of all deltas
> with a
> >    write ID < 7.
>
> Yes, you're on the right track. There's a mapping from txn_id -> write_id
> (per-table), maintained by the writers (i.e if a txn commits, then the
> write_id is visible).
>
> For each table, in each query, there's a snapshot taken which has a
> min:max and list of exceptions.
>
> When a query starts it sees that all txns below 5 are all committed or
> cleaned, therefore all <=5 is good.
>
> It knows that highest known txn is 10, so all >10 is to be ignored.
>
> And between 5 & 10, it knows that 7 is aborted and 8 is still open (i.e
> exceptions).
>
> So if it sees a delta_11 dir, it ignores it, If it sees a delta_8, it
> ignores it.
>
> The "ACID" implementation hides future updates in plain sight and doesn't
> need HDFS to be able to rename multiple dirs together.
>
> Most of that smarts is in the split-generation, not in the commit
> (however, the commit does something else to detect write-conflicts which is
> its own thing).
>
> >    If someone could point me in the right direction, or correct my
> >    understanding then I would greatly appreciate it.
>
> This implementation is built with the txn -> write_id indirection to
> support cross-replication between say an east-coast cluster to a west-coast
> cluster,
> each owning primary data-sets on their own coasts.
>
> Cheers,
> Gopal
>
>
>

Re: When do the deltas of a transaction become observable?

Posted by Gopal Vijayaraghavan <go...@apache.org>.

 
>    release of the locks) but I can't seem to find it. As it's a transactional
>    system I'd expect we observe both deltas or none at all, at the point of
>    successful commit.

In Hive's internals, "observe" is slightly different from "use". Hive ACID system 
can see a file on HDFS and then ignore it, because it is from the "future". 

You can sort of start from this line 

https://github.com/apache/hive/blob/master/storage-api/src/java/org/apache/hadoop/hive/common/ValidReaderWriteIdList.java#L70

and work backwards.

>    I had done some basic tests to determine if the observation semantics were
>    tied to the metadata in the database product for the transactional system
>    but I could only determine write IDs were influencing this, e.g. if write
>    ID = 7 for a given table, then the read would consist of all deltas with a
>    write ID < 7.

Yes, you're on the right track. There's a mapping from txn_id -> write_id (per-table), maintained by the writers (i.e if a txn commits, then the write_id is visible).

For each table, in each query, there's a snapshot taken which has a min:max and list of exceptions.

When a query starts it sees that all txns below 5 are all committed or cleaned, therefore all <=5 is good.

It knows that highest known txn is 10, so all >10 is to be ignored.

And between 5 & 10, it knows that 7 is aborted and 8 is still open (i.e exceptions).

So if it sees a delta_11 dir, it ignores it, If it sees a delta_8, it ignores it.

The "ACID" implementation hides future updates in plain sight and doesn't need HDFS to be able to rename multiple dirs together.

Most of that smarts is in the split-generation, not in the commit (however, the commit does something else to detect write-conflicts which is its own thing).

>    If someone could point me in the right direction, or correct my
>    understanding then I would greatly appreciate it.

This implementation is built with the txn -> write_id indirection to support cross-replication between say an east-coast cluster to a west-coast cluster, 
each owning primary data-sets on their own coasts.

Cheers,
Gopal