You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@orc.apache.org by Alan Gates <al...@gmail.com> on 2017/09/13 22:48:05 UTC

Thoughts on Acid reader

I’ve been looking at the OrcFile.createReader method and thinking about
what I will need to do to read acid files.  The first thing that strikes me
is that createReader takes a file.  But for acid, you need to pass the
directory because it needs to look for any relevant delta files.  Acid also
requires a ValidTxnList.  We can add that to the ReaderOptions.

It seems the best way to do this is to add a new method
OrcFile.createAcidReader that takes a directory.  I don’t like that the
user has to make a different call in the acid case.  But the user will have
to set the ValidTxnList in the reader options anyway, so the user will
already have to have split logic.

Every way I could think of for createReader to decide if it was dealing
with an acid directory or a non-acid file seemed to create jumbled
semantics.
Does the user pass a directory for the acid case but a file for non-acid?
Yuck.
Does the user pass a base file in the acid case and the code walks up the
path to find the relevant directory?  Seems error prone and slow.

Related to this is my assumption that I will need to write a new
implementation of Reader and RecordReader that understand acid.  This seems
better than putting a bunch of branches into the existing code to try to
handle both cases.

Thoughts?

Alan.

Re: Thoughts on Acid reader

Posted by Owen O'Malley <ow...@gmail.com>.

Yeah, I'd suggest adding to:

OrcFile.ReaderOptions:
   exposeAcidRowId(boolean); -- so that the returned schema includes the
ACID row id

Reader.Options:
   setValidTransactions(TransactionList); -- apply transaction filtering

Then it will read a single file (or range using
Reader.Options.range(long,long)).

.. Owen


On Thu, Sep 14, 2017 at 4:52 PM, Gopal Vijayaraghavan <go...@apache.org>
wrote:

> >  For performance reasons, you prefer the second option that I rejected
> >  where users give a file and the system finds the deletes from there.  I
> can
> >  buy that.
>
> That's simpler at least to understand and debug, the logs from ORC alone
> are enough to find consistency issues.
>
> The rest of the details are implicit to the implementation, beyond a base
> file and the current transaction state.
>
> This is nearly exactly how the LLAP ACID cache patch does today, which
> looks the cache up on the base file and applies local transaction state per
> query (i.e valid txns list which hides the committed deletes from an older
> query).
>
> >  I don’t follow your last comment about ROW__ID being projected out to
> the
> > user.  ORC isn’t currently hiding that field from the reader is it?
>
> In general, a BI tool of some kind over ACID probably cares about the data
> and not the metadata about which rows belong to which transaction in
> general.
>
> Hiding ROW__ID makes the consumer side of the reader identical between
> ACID and non-ACID, unless it is being read by a "SELECT FOR UPDATE" reader.
>
> Cheers,
> Gopal
>
>
>

Re: Thoughts on Acid reader

Posted by Gopal Vijayaraghavan <go...@apache.org>.

>  For performance reasons, you prefer the second option that I rejected
>  where users give a file and the system finds the deletes from there.  I can
>  buy that.

That's simpler at least to understand and debug, the logs from ORC alone are enough to find consistency issues.

The rest of the details are implicit to the implementation, beyond a base file and the current transaction state.

This is nearly exactly how the LLAP ACID cache patch does today, which looks the cache up on the base file and applies local transaction state per query (i.e valid txns list which hides the committed deletes from an older query).

>  I don’t follow your last comment about ROW__ID being projected out to the
> user.  ORC isn’t currently hiding that field from the reader is it?

In general, a BI tool of some kind over ACID probably cares about the data and not the metadata about which rows belong to which transaction in general.

Hiding ROW__ID makes the consumer side of the reader identical between ACID and non-ACID, unless it is being read by a "SELECT FOR UPDATE" reader.

Cheers,
Gopal

Re: Thoughts on Acid reader

Posted by Alan Gates <al...@gmail.com>.

For performance reasons, you prefer the second option that I rejected
where users give a file and the system finds the deletes from there.  I can
buy that.

As for passing splits rather than files, that makes sense but seems like a
bigger change, since this should work with and without ACID, so I’ll leave
that for a later time.

I don’t follow your last comment about ROW__ID being projected out to the
user.  ORC isn’t currently hiding that field from the reader is it?

Alan.

On Wed, Sep 13, 2017 at 8:47 PM, Gopal Vijayaraghavan <go...@apache.org>
wrote:

> > The first thing that strikes me is that createReader takes a file.
> > But for acid, you need to pass the directory because it needs to look
> for any relevant delta files.
>
> The ACID 2.x impl, the InputFormat gets a directory - but a Reader should
> still be getting an individual file.
>
> In fact, it should be getting something smaller than a File (ideally an
> HDFS block) which is encoded as FileSplit (path, offset, len) and a
> ValidTxnList.
>
> In the original ACID impl, multiple delta files get a single Reader, while
> in the new ACID 2.x impl the concept of a "base file + deltas" is
> irrelevant.
>
> All insert deltas are equivalent base files - the base concept is (1
> Stripe) + (Relevant deletes) == 1 vector reader.
>
> There's no need to know which delete deltas were already read unlike the
> UPDATE ops (i.e Split #1 and Split #2 can load their deletes independently,
> without worrying about double row outputs).
>
> If a delete delta which was loaded is not found in the input split, it has
> no effect on the reader's output correctness.
>
> > I don’t like that the user has to make a different call in the acid case.
>
> You need to identify within ORC whether the file provided is a base file
> or a delta insert file.
>
> If the file is a base_xx, then all deletes exist in ./delete_deltas_*/
>
> if the file is named delta_xx, then all deletes exist in ../delete_deltas*/
>
> Those are strictly enforced by the ACID implementation & can serve as easy
> assumptions.
>
> Other than that, a non-null ValidTxnList is all it should take.
>
> >   the user will already have to have split logic.
>
> The part where the logic-splits off is into InputFormat - detecting
> compaction during split generation is strictly the InputFormat's problem.
>
> There's a bit of magic there which is in plain sight, like how the INSERT
> OVERWRITE works transactionally (HIVE-14988).
>
> For me, the clear division is to look at this problem as "Details about
> file names" (includes  HIVE-14535) and "Details about a Stripe" (Reader +
> valid-txns + deletes application).
>
> Everything in the middle is just the same as regular ORC, like PPD.
>
> From the other side of the mirror, the flat ORC API is pretty much a Null
> ROW__ID pruned already, with 0 deletes and Long.MAX watermark in the
> ReaderOptions.
>
> > implementation of Reader and RecordReader that understand acid
>
> There's an "*" to most of the above - a reader which intends to modify the
> data might need a different API, to be explicit that the ROW__ID is
> projected out  to the user.
>
> Cheers,
> Gopal
>
>
>
> On 9/13/17, 3:48 PM, "Alan Gates" <al...@gmail.com> wrote:
>
>     I’ve been looking at the OrcFile.createReader method and thinking about
>     what I will need to do to read acid files.  The first thing that
> strikes me
>     is that createReader takes a file.  But for acid, you need to pass the
>     directory because it needs to look for any relevant delta files.  Acid
> also
>     requires a ValidTxnList.  We can add that to the ReaderOptions.
>
>     It seems the best way to do this is to add a new method
>     OrcFile.createAcidReader that takes a directory.  I don’t like that the
>     user has to make a different call in the acid case.  But the user will
> have
>     to set the ValidTxnList in the reader options anyway, so the user will
>     already have to have split logic.
>
>     Every way I could think of for createReader to decide if it was dealing
>     with an acid directory or a non-acid file seemed to create jumbled
>     semantics.
>     Does the user pass a directory for the acid case but a file for
> non-acid?
>     Yuck.
>     Does the user pass a base file in the acid case and the code walks up
> the
>     path to find the relevant directory?  Seems error prone and slow.
>
>     Related to this is my assumption that I will need to write a new
>     implementation of Reader and RecordReader that understand acid.  This
> seems
>     better than putting a bunch of branches into the existing code to try
> to
>     handle both cases.
>
>     Thoughts?
>
>     Alan.
>
>
>
>

Re: Thoughts on Acid reader

Posted by Gopal Vijayaraghavan <go...@apache.org>.

> The first thing that strikes me is that createReader takes a file.
> But for acid, you need to pass the directory because it needs to look for any relevant delta files.

The ACID 2.x impl, the InputFormat gets a directory - but a Reader should still be getting an individual file.

In fact, it should be getting something smaller than a File (ideally an HDFS block) which is encoded as FileSplit (path, offset, len) and a ValidTxnList.

In the original ACID impl, multiple delta files get a single Reader, while in the new ACID 2.x impl the concept of a "base file + deltas" is irrelevant.

All insert deltas are equivalent base files - the base concept is (1 Stripe) + (Relevant deletes) == 1 vector reader.

There's no need to know which delete deltas were already read unlike the UPDATE ops (i.e Split #1 and Split #2 can load their deletes independently, without worrying about double row outputs).

If a delete delta which was loaded is not found in the input split, it has no effect on the reader's output correctness.

> I don’t like that the user has to make a different call in the acid case.

You need to identify within ORC whether the file provided is a base file or a delta insert file.

If the file is a base_xx, then all deletes exist in ./delete_deltas_*/

if the file is named delta_xx, then all deletes exist in ../delete_deltas*/

Those are strictly enforced by the ACID implementation & can serve as easy assumptions.

Other than that, a non-null ValidTxnList is all it should take.

> the user will already have to have split logic.

The part where the logic-splits off is into InputFormat - detecting compaction during split generation is strictly the InputFormat's problem.

There's a bit of magic there which is in plain sight, like how the INSERT OVERWRITE works transactionally (HIVE-14988).

For me, the clear division is to look at this problem as "Details about file names" (includes HIVE-14535) and "Details about a Stripe" (Reader + valid-txns + deletes application).

Everything in the middle is just the same as regular ORC, like PPD.

From the other side of the mirror, the flat ORC API is pretty much a Null ROW__ID pruned already, with 0 deletes and Long.MAX watermark in the ReaderOptions.

> implementation of Reader and RecordReader that understand acid

There's an "*" to most of the above - a reader which intends to modify the data might need a different API, to be explicit that the ROW__ID is projected out to the user.

Cheers,
Gopal

On 9/13/17, 3:48 PM, "Alan Gates" <al...@gmail.com> wrote:

I’ve been looking at the OrcFile.createReader method and thinking about
what I will need to do to read acid files. The first thing that strikes me
is that createReader takes a file. But for acid, you need to pass the
directory because it needs to look for any relevant delta files. Acid also
requires a ValidTxnList. We can add that to the ReaderOptions.

It seems the best way to do this is to add a new method
OrcFile.createAcidReader that takes a directory. I don’t like that the
user has to make a different call in the acid case. But the user will have
to set the ValidTxnList in the reader options anyway, so the user will
already have to have split logic.

Every way I could think of for createReader to decide if it was dealing
with an acid directory or a non-acid file seemed to create jumbled
semantics.
Does the user pass a directory for the acid case but a file for non-acid?
Yuck.
Does the user pass a base file in the acid case and the code walks up the
path to find the relevant directory? Seems error prone and slow.

Related to this is my assumption that I will need to write a new
implementation of Reader and RecordReader that understand acid. This seems
better than putting a bunch of branches into the existing code to try to
handle both cases.

Thoughts?

Alan.