You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@impala.apache.org by "Zoltán Borók-Nagy (Jira)" <ji...@apache.org> on 2020/03/10 18:47:00 UTC

[jira] [Created] (IMPALA-9484) Milestone 1: properly scan files that has full ACID schema

Zoltán Borók-Nagy created IMPALA-9484:
-----------------------------------------

             Summary: Milestone 1: properly scan files that has full ACID schema
                 Key: IMPALA-9484
                 URL: https://issues.apache.org/jira/browse/IMPALA-9484
             Project: IMPALA
          Issue Type: Sub-task
            Reporter: Zoltán Borók-Nagy


 

Full ACID row format looks like this:
{
 "operation": 0,
 "originalTransaction": 1,
 "bucket": 536870912,
 "rowId": 0,
 "currentTransaction": 1,
 "row": \{"i": 1}
}

User columns are nested under "row". The frontend should create proper tuples and slot descriptors for the scan nodes to read the files correctly.

We should be able to query the ACID columns, at least for debugging/testing. Hive uses the special “row__id” identifier for that.

Impala should raise an error if there are delete deltas. Directory filtering should filter out minor compacted directories since the records from those need validation.

Non-goals in this sub-task:
 * row validation against validWriteIdList
 * reading "original files" (files in non-ACID format)
 * reading delete deltas



--
This message was sent by Atlassian Jira
(v8.3.4#803005)