You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/04/01 04:25:00 UTC

[jira] [Commented] (KUDU-1945) Support generation of surrogate primary keys (or tables with no PK)

    [ https://issues.apache.org/jira/browse/KUDU-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17707500#comment-17707500 ] 

ASF subversion and git services commented on KUDU-1945:
-------------------------------------------------------

Commit 8273792156d26c46f788558c896a0729ac2fedd1 in kudu's branch refs/heads/master from Abhishek Chennaka
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=827379215 ]

KUDU-1945 Update auto incrementing counter during bootstrap

The auto incrementing counter would be reset to zero when the tablet
is being initialized. It is essential to have the correct value of
the counter. There are two cases:

1. WAL segments contain insert operations
In this scenario the WAL segments are replayed and since each insert
operation entry has auto incrementing counter which will be used for
the insert operations present in that entry, as long as we have the
latest insert operation entry in the WAL segments, the auto
incrementing counter is populated correctly during bootstrap.

2. WAL segments do not contain insert operations
In this case, we need to fetch the highest auto incrementing counter
value present in the data which is already flushed and update the
in-memory auto incrementing counter appropriately. This patch
accomplishes this task.

There are tests for the bootstrap scenarios where
1. We have no WAL segments with an INSERT OP containing the auto
incrementing column value. We rely on the auto incrementing counter
present in the data directories in this case.
2. We have no WAL segments with auto incrementing column value
and all the rows of the table are deleted. We reset the auto
incrementing counter in this case.
3. We have non committed replicate ops containing INSERT OPs with
the auto incrementing column values.

Manually tested the time taken to populate the auto incrementing
counter:
Columns - A Non Unique Primary Key column of type INT32
        - 8 INT64 columns
        - 5 STRING columns
Rows - 500k rows with data populated
Time taken in populating the counter during bootstrap:
Min - 235ms
Max - 466ms
Median - 312ms

The total time spent boostrapping the tablet was between 18-25
seconds.

Change-Id: I61b305efd7d5a065a2976327567163956c0c2184
Reviewed-on: http://gerrit.cloudera.org:8080/19445
Reviewed-by: Alexey Serbin <al...@apache.org>
Tested-by: Kudu Jenkins


> Support generation of surrogate primary keys (or tables with no PK)
> -------------------------------------------------------------------
>
>                 Key: KUDU-1945
>                 URL: https://issues.apache.org/jira/browse/KUDU-1945
>             Project: Kudu
>          Issue Type: New Feature
>          Components: client, master, tablet
>            Reporter: Todd Lipcon
>            Priority: Major
>              Labels: roadmap-candidate
>
> Many use cases have data where there is no "natural" primary key. For example, a web log use case mostly cares about partitioning and not about precise sorting by timestamp, and timestamps themselves are not necessarily unique. Rather than forcing users to come up with their own surrogate primary keys, Kudu should support some kind of "auto_increment" equivalent which generates primary keys on insertion. Alternatively, Kudu could support tables which are partitioned but not internally sorted.
> The advantages would be:
> - Kudu can pick primary keys on insertion to guarantee that there is no compaction required on the table (eg always assign a new key higher than any existing key in the local tablet). This can improve write throughput substantially, especially compared to naive PK generation schemes that a user might pick such as UUID, which would generate a uniform random-insert workload (worst case for performance)
> - Make Kudu easier to use for such use cases (no extra client code necessary)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)