You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by cs...@apache.org on 2023/03/09 14:13:26 UTC

[impala] 04/07: IMPALA-11906: [DOCS] Document the support for non-unique primary key

This is an automated email from the ASF dual-hosted git repository.

csringhofer pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 27e238c8cf31e5c28ad0fe63a1aeabf2a6e05414
Author: Shajini Thayasingh <st...@cloudera.com>
AuthorDate: Mon Mar 6 10:49:16 2023 -0800

    IMPALA-11906: [DOCS] Document the support for non-unique primary key
    
    Incorporated the comments received.
    Added a new sub-section.
    Change-Id: I7b5a452f2199d097077150c012497aa4a3ecf7d9
    Reviewed-on: http://gerrit.cloudera.org:8080/19587
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Abhishek Chennaka <ac...@cloudera.com>
    Reviewed-by: Wenzhe Zhou <wz...@cloudera.com>
---
 docs/topics/impala_kudu.xml | 147 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 126 insertions(+), 21 deletions(-)

diff --git a/docs/topics/impala_kudu.xml b/docs/topics/impala_kudu.xml
index 0ae80625f..0bc781b3b 100644
--- a/docs/topics/impala_kudu.xml
+++ b/docs/topics/impala_kudu.xml
@@ -203,6 +203,97 @@ under the License.
       <p outputclass="toc inpage"/>
 
     </conbody>
+    <concept id="non_unique_primary_key">
+      <title>Non-unique Primary Keys for Kudu Tables</title>
+      <conbody>
+        <p>Kudu now allows a user to create a non-unique primary key for a table when creating a
+          table. The data engine handles this by appending a system generated auto-incrementing
+          column to the non-unique primary key columns. This is done to guarantee the uniqueness of
+          the primary key. This auto-incrementing column is named as 'auto_incrementing_id' with
+          bigint type. The assignment to it during insertion is automatic.</p>
+      </conbody>
+    </concept>
+    <concept id="create">
+      <title>Create a Kudu Table with a non-unique PRIMARY KEY</title>
+      <conbody>
+        <p>The following example shows creating a table with a non-unique PRIMARY KEY.</p>
+<codeblock>
+CREATE TABLE kudu_tbl1
+(
+ id INT NON UNIQUE PRIMARY KEY,
+ name STRING
+)
+PARTITION BY HASH (id) PARTITIONS 3 STORED as KUDU;</codeblock>
+        <p>The effective PRIMARY KEY in the above case will be {id, auto_increment_id}</p>
+        <note>"auto_incrementing_id" column cannot be added, removed or renamed with ALTER TABLE
+          statements.</note>
+      </conbody>
+    </concept>
+    <concept id="verify">
+      <title>Verify the PRIMARY KEY is non-unique</title>
+      <conbody>
+        <p>You can now check the PRIMARY KEY created is non-unique by running the following DESCRIBE
+          command. A new property "key_unique" shows if the primary key is unique. System generated
+          column "auto_incrementing_id" is shown in the output for the table as a non-unique primary
+          key.</p>
+<codeblock>
+  describe kudu_tbl1
+  +----------------------+--------+---------+-------------+------------+----------+---------------+---------------+---------------------+------------+
+  | name                 | type   | comment | primary_key | key_unique | nullable | default_value | encoding      | compression         | block_size |
+  +----------------------+--------+---------+-------------+------------+----------+---------------+---------------+---------------------+------------+
+  | id                   | int    |         | true        | false      | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+  | auto_incrementing_id | bigint |         | true        | false      | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+  | name                 | string |         | false       |            | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+  +----------------------+--------+---------+-------------+------------+----------+---------------+---------------+---------------------+------------+
+  Fetched 3 row(s) in 4.72s
+</codeblock>
+      </conbody>
+    </concept>
+    <concept id="auto_incrementing_col">
+      <title>Query Auto Incrementing Column</title>
+      <conbody>
+        <p>When you query a table using the SELECT statement, it will not display the system
+          generated auto incrementing column unless the column is explicitly specified in the select
+          list.</p>
+      </conbody>
+    </concept>
+    <concept id="no_primary_key">
+      <title>Create a Kudu table without a PRIMARY KEY attribute</title>
+      <conbody>
+        <p>You can create a Kudu table without specifying a PRIMARY KEY or a PARTITION KEY since
+          they are optional, however you cannot create a Kudu table without specifying both PRIMARY
+          KEY and PARTITION KEY. If you do not specify the primary key attribute, the partition key
+          columns can be promoted as a non-unique primary key. This is possible only if those
+          columns are the beginning columns of the table.</p>
+        <p>In the following example, 'a' and 'b' will be promoted as a non-unique primary key,
+          'auto_incrementing_id' column will be added by Kudu engine. 'a', 'b' and
+          'auto_incrementing_id' form the effective unique composite primary key.</p>
+        <example>
+<codeblock>
+CREATE TABLE auto_table
+(
+ a BIGINT,
+ b STRING,
+)
+PARTITION BY HASH(a, b) PARTITIONS 2 STORED AS KUDU;
+</codeblock>
+          <p>The effective primary key in this case would be {a, b, auto_incrementing_id}</p>
+        </example>
+      </conbody>
+    </concept>
+    <concept id="limitations">
+      <title>Limitations</title>
+      <conbody>
+        <ul>
+          <li>UPSERT operation is not supported for Kudu tables with non-unique primary key. If you
+            run an UPSERT statement for a Kudu table with a non-unique primary key it will fail with
+            an error.</li>
+          <li>Since the auto generated key for each row will be assigned after the row’s data is
+            generated and after the row lands in the tablet, you cannot use this column in the
+            partition key.</li>
+        </ul>
+      </conbody>
+    </concept>
 
     <concept id="kudu_primary_key">
 
@@ -210,14 +301,13 @@ under the License.
 
       <conbody>
 
-        <p>
-          Kudu tables introduce the notion of primary keys to Impala for the first time. The
+        <p> Kudu tables introduce the notion of primary keys to Impala for the first time. The
           primary key is made up of one or more columns, whose values are combined and used as a
-          lookup key during queries. The tuple represented by these columns must be unique and cannot contain any
-          <codeph>NULL</codeph> values, and can never be updated once inserted. For a
-          Kudu table, all the partition key columns must come from the set of
-          primary key columns.
-        </p>
+          lookup key during queries. The primary key can be non-unique. The uniqueness of the
+          primary key is guaranteed by appending a system-generated auto-incrementing column to the
+          non-unique primary key columns. The tuple represented by these columns cannot contain any
+          NULL values, and can never be updated once inserted. For a Kudu table, all the partition
+          key columns must come from the set of primary key columns. </p>
 
         <p>
           The primary key has both physical and logical aspects:
@@ -232,14 +322,13 @@ under the License.
             </p>
           </li>
           <li>
-            <p>
-              On the logical side, the uniqueness constraint allows you to avoid duplicate data in a table.
-              For example, if an <codeph>INSERT</codeph> operation fails partway through, only some of the
-              new rows might be present in the table. You can re-run the same <codeph>INSERT</codeph>, and
-              only the missing rows will be added. Or if data in the table is stale, you can run an
-              <codeph>UPSERT</codeph> statement that brings the data up to date, without the possibility
-              of creating duplicate copies of existing rows.
-            </p>
+            <p> You can insert non-unique data using an INSERT statement but the data saved in Kudu
+              table for each row which will be turned to unique by the system generated
+              auto-incrementing column. If the primary key is non-unique, the uniqueness will not
+              cause insertion failure. However, if the primary key is set as non-unique and if an
+              INSERT operation fails part way through, all rows except the rows with writing errors
+              will be added into the table. The duplicated rows will be added with different values
+              for auto-incrementing columns. </p>
           </li>
         </ul>
 
@@ -273,7 +362,7 @@ under the License.
         </p>
 
 <codeblock>
-  PRIMARY KEY
+[NON UNIQUE] PRIMARY KEY
 | [NOT] NULL
 | ENCODING <varname>codec</varname>
 | COMPRESSION <varname>algorithm</varname>
@@ -300,7 +389,9 @@ under the License.
             combination of values for the columns.
           </p>
 
-          <p conref="../shared/impala_common.xml#common/pk_implies_not_null"/>
+          <p>Because all of the primary key columns must have non-null values, specifying a column
+            in the PRIMARY KEY or NON-UNIQUE PRIMARY KEY clause implicitly adds the NOT NULL
+            attribute to that column.</p>
 
           <p>
             The primary key columns must be the first ones specified in the <codeph>CREATE
@@ -331,6 +422,21 @@ CREATE TABLE pk_at_end
   col3 BOOLEAN,
   PRIMARY KEY (col1)
 ) PARTITION BY HASH(col1) PARTITIONS 2 STORED AS KUDU;
+
+CREATE TABLE pk_inline
+(
+col1 BIGINT [NON UNIQUE] PRIMARY KEY,
+col2 STRING,
+col3 BOOLEAN
+) PARTITION BY HASH(col1) PARTITIONS 2 STORED AS KUDU;
+
+CREATE TABLE pk_at_end
+(
+col1 BIGINT,
+col2 STRING,
+col3 BOOLEAN,
+[NON UNIQUE] PRIMARY KEY (col1)
+) PARTITION BY HASH(col1) PARTITIONS 2 STORED AS KUDU;
 </codeblock>
 
           <p>
@@ -373,11 +479,10 @@ SHOW CREATE TABLE inline_pk_rewritten;
 +------------------------------------------------------------------------------+
 </codeblock>
 
-          <p>
-            The notion of primary key only applies to Kudu tables. Every Kudu table requires a
+          <p> The notion of primary key only applies to Kudu tables. Every Kudu table requires a
             primary key. The primary key consists of one or more columns. You must specify any
-            primary key columns first in the column list.
-          </p>
+            primary key columns first in the column list or specify partition key with the beginning
+            columns of the table. </p>
 
           <p>
             The contents of the primary key columns cannot be changed by an